Data sources have mostly turned digital



Yüklə 445 b.
tarix27.10.2017
ölçüsü445 b.
#16765





Data sources have mostly turned digital

  • Data sources have mostly turned digital

    • Analog processes
      • e.g., photography, films
    • Paper-based interactions
      • e.g., banking, e-administration
    • Communications
      • e.g., email, SMS, MMS, Skype
  • Where is your personal data? … In data centers

    • 112 new emails per day  Mail servers
    • 65 SMS sent per day  Telcos
    • 800 pages of social data  Social networks
    • Web searches, list of purchases  google, amazon


Is this good news ?

    • Is this good news ?
    • $2 billion a year spend by US companies on third-party data about individuals (Forrester Report)
    • $44.25 is the estimated return on $1 invested in email marketing (oil is up to 0.5$/yr)
    • High Market Value Companies
      • Facebook: value / #accounts 50$
      • Google: $38 billion business sells ads based on how people search the Web
      • Amazon (knows purchase intent), mail order systems companies (gmail), loyalty programs (supermarkets), banks & insurrance, employement market (linkedIn, viadeo), travel & transportation (voyages-sncf), the « love » market (meetic), etc.




All these data analytics are run on « centralised » (e.g. data centers)

  • All these data analytics are run on « centralised » (e.g. data centers)

  • Intrinsic problem #1: personal data is exposed to sophisticated attacks

  • Intrinsic problem #2: personal data is hostage of sudden privacy changes

    • Centralised administration of data means delegation of control
    • This leads to regular changes, with application (and business) evolution, with mergers and acquisition, etc. (e.g Facebook 2012)
  • Increasing security is only a partial solution since does not solve those intrinsic limitations

    • E.g., TrustedDB [BS12] proposes tamper-resistant hardware to secure outsourced centralized databases.


A Personal Data Ecosystem…

  • A Personal Data Ecosystem…

  • … built around user-centricity and trust,

  • achieved through a decentralized architecture

  • with the same computing expressivity







1. Users store their own data

  • 1. Users store their own data

  •  minimize abusive usage

  • 2. Auto-administered platform

  •  no DBA attack (even by user)

  • 3. Enforce privacy principles for externalized (shared) data

  •  best if the recipient of the data is another TC

  • 4. Tamper-resistance + certified code/secure execution + single user + physical access needed

  •  ratio cost/benefit of an attack is very high







Token Characteristics :

  • Token Characteristics :

  • High security:

    • High ratio Cost/Benefit of an attack;
    • Secure against its owner;
  • Modest computing resources (~10Kb of RAM, 50MHz CPU);

  • Low availability: physically controlled by its owner; connects and disconnects at it will



PROBLEM :

  • PROBLEM :

    • How to perform global queries on the asymmetric architecture? (i.e. using data from many/all cells)


Several approaches are possible to securely perform global computations:

    • Several approaches are possible to securely perform global computations:
    • Use only an untrusted server/cloud/P2P and use generic (and costly) algorithms. (e.g. Secure Multi-Party Computing [Yao82, GMW87, CKL06], fully homomorphic encryption [Gent09]) Problem = COST
    • Use only an untrusted server/cloud/P2P and develop a specific algorithm for each specific class of queries or applications. (e.g. DataMining Toolkit [CKV+02]) Problem = GENERICITY
    • Introduce a tangible element of trust, through the use of a trusted component and develop a generic methodology to execute any centralized algorithm in this context. ([Katz07, GIS+10, AAB+10])  Problem = TRUST


Querier:

  • Querier:

  • Shares the secret key with TDSs (for encrypt the query & decrypt result).

  • Classical Access control policy (e.g. RBAC):

    • Cannot get the raw data stored in TDSs (get only the final result)
    • Can obtain only authorized views of the dataset ( do not care about inferential attacks)
  • Supporting Server Infrastructure:

  • Doesn’t know query (so, attributes in GROUP BY clause) b/c query is encrypted by Querier before sending to SSI.

  • Has prior knowledge about data distribution.

  • Honest-but-curious attacker: Frequency-based attack

    • SSI matches the plaintext and ciphertext of the same frequency.
    • e.g. investigates remarkable (very high/low) frequencies in dataset distribution






The main difficulty is with AGGREGATE QUERIES !!

  • The main difficulty is with AGGREGATE QUERIES !!

  • Solutions vary depending on which kind of encryption is used, how the SSI constructs the partitions, and what information is revealed to the SSI.

  • Secure aggregation solution (presented briefly here)

  • Noise-based solutions (see paper)

    • random (white) noise
    • noise controlled by the complementary domain
  • Histogram-based solutions (see paper)

  • We investigate these solutions along the directions of performance and security.





Secure Aggregation Efficiency problem :

  • Secure Aggregation Efficiency problem :

  • nDet_Enc on AG  SSI cannot gather tuples belonging to the same group into same partition.



Distribution of AG is discovered and distributed to all TDSs.

  • Distribution of AG is discovered and distributed to all TDSs.

  • TDS allocates its tuple to corresponding bucket.

  • TDS send to SSI: {h(bucketId),nDet_Enc(tuple)}

  • Consequences :







Internal time consumption

  • Internal time consumption



Dataset size Ttuple : varies from 5 to 65 million

  • Dataset size Ttuple : varies from 5 to 65 million

  • Number of groups G : varies from 1 to 106

  • Number of TDSs participating in the computation as a percentage of all TDSs connected at a given time Ttds : varies from 1% to 100%).

  • We fix two parameters and vary the other, measuring : execution time, parallelism of the protocol, total load, maximum load on one TDS

  • When the parameters are fixed :

  • Ttuple =106, G=103, % of TDS connected = 10% of Ttuple.

  • We also compute and use the optimal value for all reduction factors as well as for.

  • In the figures, we plot two curves for Rnf_Noise protocols RN (nf = 2) and WN (nf = 1000) to capture the impact of the ratio of fake tuples.















Experimental Scalability (experiments on LIPN cluster) TODS’16





Total Load

  • Total Load



Select ..

  • Select ..

  • From ..

  • Where ..

  • Group By AG

  • G = card (AG)

  • Security: S_Agg > ED_Hist

  • Performance:

    • G > 10:
    • ED_Hist faster than S_Agg
    • G <= 10:
    • ED_Hist slower than S_Agg




Short/Middle term research : Data intensive Computing on an Asymmetric Architecture

  • SQL (With SMIS)

    • Queries here do not have joins !
    • Take into account more attack models (e.g. Broken Tokens)
    • Field experiment on usability (with ISN / A. Katsouraki PhD thesis)
    • Add usage control (A. Michel PhD thesis)
  • Private/Secure MapReduce (With LIPN -- some results in Coopis’15)

    • Investigate compatibility of our protocols.
    • Develop new protocols.
    • Check performance !
  • Secure Graph computations (With LIX)

    • Study social networking applications
    • Secure K-core and k-truss computations (Rossi PhD thesis)
  • XML management

    • Adapt the work on XQ2P (Butnaru, Gardarin, Nguyen) to the Trusted Cells context.
    • Distributed Window Queries.


Promoting the Trusted Cells vision

  • Trusted Cells “Core”

    • Open hardware and software bundle : basic functionalities
      • Local DB
      • Distributed DB
      • NoSQL DB
    •  needed to develop PbD personal data management applications !
    • Promote an open source community around Trusted Cells (UVSQ, INSA CVL, ENSIIE, INSA Lyon…)
  • Beyond Tamper Resistant HW

    • Results are useable even with lower trust elements.
    • Include social trust / reputation.
    • Use virtualization.










Yüklə 445 b.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin