|
Distributed Data Warehouses
|
səhifə | 7/9 | tarix | 01.11.2017 | ölçüsü | 446 b. | | #24870 |
| Distributed Data Warehouses Query Optimization Data Access [and Control] Data Mining
A piece of medical data (age, image, biological result, salient object in an image) has a meaning A piece of medical data (age, image, biological result, salient object in an image) has a meaning - It conveys information that can be interpreted (in multiple ways !)
Meta-data can be attached to medical data… or not - pre-processing is necessary
Medical data are often private The medical data of a patient are often disseminated over multiple sites - access rights/authentication problem, collection/integration of data into partial views, identification of data/users
Medical (meta-)data are complex and not yet (fully) standardized
Why is it so difficult ? Why is it so difficult ? - multiple administrative domains
- very sensitive data => security/privacy issues
- wide distribution
- unpredictability
- relationship with data replica
- heterogeneity
- dynamicity (permanent production of large volumes of data)
Centralized data warehouse ? - Not realistic at a large scale and not acceptable
A proposition [Wehrle’s thesis]: virtual data warehouses on the grid A proposition [Wehrle’s thesis]: virtual data warehouses on the grid Components: - a federated schema
- a set of partial views (“chunks”) materialized at the local system level
Advantages - Flexibility wrt users’ needs
- Good use of the storage capacity of the grid and scalability
- Security control at the local level
- Global view of the disseminated data
Drawbacks and open issues Drawbacks and open issues - maintenance protocols
- indexing tools
- access to data and negotiation
- query processing
Brokers act as interfaces between data, services and applications Brokers act as interfaces between data, services and applications Possible locations - at the interface between the grid and the external data repositories
- on the grid storage elements
- at the interface between the grid and the user
- inside the network (e.g. routers)
Open issues - caching: computation results, query partial results…
- data indexing
- prefetching
- user’s customization
- inter brokers collaboration
- a key issue: security and privacy
Medical data belong to the patient that should be able to give access rights to who he wants Medical data belong to the patient that should be able to give access rights to who he wants To whom processed (even anonymous) data belong to ? How one can combine privacy and dissemination/ replication/caching ? What about traceability ? What about traceability ?
Structure of the data: few records, many attributes Structure of the data: few records, many attributes Parallelizing data mining algorithms for the grid - volatility of the resources (data, processing)
- fault tolerance, checkpointing
- distribution of the data: local data exploration + aggregation function to converge towards a unified model
- incremental production of the data => active data mining techniques
University of Virginia University of Virginia Object-oriented approach. Objects = data, applications, sensors, computing resources, codes…: everything is an object! Loosely coupled codes Single naming space Reuse of existing OS and protocols; definition of message formats and high level protocols Core objects: naming, binding, object creation/activation/desactivation/destruction Methods: description via an IDL Security: in the hands of the users Resource allocation: a site can define its own policy
Dostları ilə paylaş: |
|
|