Overall strategy
JRAs are not exclusively about the provision of the services required to manage the lifecycle of data from the generation to the harmonization, annotation, processing, sharing, and reuse. Rather, they will cover the provision of the necessary technology to manage at affordable costs the data e-Infrastructure; to orchestrate services in repeatable executions flows that have to be stored, published, and reused as workflows; to participate to the standardisation of the Data e-Infrastructure resource profiles contributing to the sustainability of this initiative; to provide high-level APIs to offer operations on the data by hiding the complexity inherent to all distributed Data e-Infrastructures.
JRA activities will not start from scratch. They will start form large open-software initiatives widely adopted in existing e-Infrastructures and they will contribute to these initiatives with enhanced versions and new software packages capable to deal with the enforcement of declarative-specified policies. Project-members already support open-source initiatives such as the gCube and OpenSDMX frameworks. gCube offers platforms and services to manage and manipulate data and metadata in an autonomic-managed Data e-Infrastructure. gCube supports interactive and web-based processes that are compliant with the standards defined by the Open Geospatial Consortium and OASIS committees. It allows to access computational and storage resources managed through the gLite grid middleware while by exploiting the signed MoU between D4Science and EMI - European Middleware Initiative – the new grid middleware generation will be easily integrated.
To achieve the above objective a set of activities are planned and organised into four interacting work packages:
-
JRA1 – iMarine Data e-Infrastructure Enabling-technology Development will enhance the gCube Enabling Services that enable the operation of the data e-Infrastructure by: (i) supporting the deployment of the e-Infrastructure services; (ii) interfacing external infrastructures; (iii) enforcing resource usage policies defined by the EA-CoP; (iv) distributing process executions to several computational platforms.
-
JRA2 – Data Management Facilities Development will integrate, enhance, and develop a set of services for managing statistical data (including but not limited to time series), marine biology, environmental data such as satellite data and sensor data; taxonomies, ontologies, and code-lists; structured and semi-structured textual data; and binary data. The management facilities will include those needed to properly deliver data access and storage, data transfer, data assessment, data certification, and data harmonization software components.
-
JRA3 – Data Consumption Facilities Development will develop a set of facilities for supporting the data processing tasks the EA-CoP faces. These facilities include services for: (i) data discovery and retrieval; (ii) generation and manipulation of data; (iii) mining and extraction of knowledge from raw data; (iv) generation of data provenance information; (v) data transformation; and (vi) visualization and simulation of scientific data.
-
JRA4 – iMarine Data e-Infrastructures Integration and Interoperability Facilities Development will develop a set of facilities for supporting the exploitation of available and emerging facilities in the iMarine Data e-Infrastructure by applications mainly but not limited to those developed in SA2. These facilities will be based on the identification of standards at the boundaries of services and the entire infrastructure for interactions with its elements or its entity and on the provision of programmatic APIs that will enable integration and interoperability, for and beyond the specifications adopted.
The software realised by JRA1 activities will start from the solid base of the gCube framework. It will enhance the gCube enabling technology by contributing to the open source community that maintains it. This will allow to receive feedback and revisions also from members external to the iMarine Data e-Infrastructure by initiating a virtuous cycle with other Data e-Infrastructures that rely on the same foundations. JRA2 and JRA3 will start from the data facilities offered by gCube and OpenSDMX frameworks and they will complement and enhance them to create a complete software suite for data storage and management. The results of those activities will then be made simpler to use but not functionally reduced by JRA4. This last activity will provide a simple interface with adapters for the corresponding services that will allow developers to invoke services in a common way across the Data e-Infrastructure and the difference protocols and standards.
GANTT Diagram
Detailed work description Work package list
Work package No
|
Work package title
|
Type of activity
|
Lead participant No
|
Lead participant short name
|
Person-months
|
Start month
|
End month
|
JRA1
|
iMarine Data e-Infrastructure Enabling-technology Development
|
RTD
|
2
|
CNR
|
59
|
1
|
24
|
JRA2
|
Data Management Facilities Development
|
RTD
|
2
|
CNR
|
91
|
1
|
27
|
JRA3
|
Data Consumption Facilities Development
|
RTD
|
3
|
NKUA
|
107
|
1
|
27
|
JRA4
|
Data e-Infrastructures Integration and Interoperability Facilities Development
|
RTD
|
3
|
NKUA
|
39
|
1
|
28
|
|
|
TOTAL
|
296
|
|
| JRA1 – Data e-Infrastructure Enabling-technology Development
The goals of this work package is to deliver the high quality technology enabling the iMarine Data e-Infrastructure. It will receive input from NA3, while its outcomes will be mainly exploited by the Service Activity and by the development activities of the other JRA WPs. Moreover, the work package will develop new facilities by creating the ground to support the effective integration with external technologies and platforms (such as Cloud platforms, EGEE/EGI and Genesi-DR Data infrastructures, OpenSDMX applications, and so on). To meet these objectives, the work package will extend the gCube enabling technology produced by the D4Science II project to tackle the new challenges posed by the iMarine’s expectations.
The activity performed by this work package is organized as follows: (i) develop new facilities for the management of the data e-Infrastructure, enhance the development gCube framework and runtime environment, facilitate the integration of services and applications already available to the EA-CoP and promote the linking with Cloud platforms (TJRA1.1); (ii) enhance the current solutions for authentication, authorization, accounting, and auditing to take into account the declaratively-specified policies defined by the EA-CoP and to guarantee highly controlled resources with the aim to foster scalability and interoperability of the delivered technology (TJRA1.2); (iii) enhance gCube facilities for the definition, hosting, and execution of scientific and management workflows (TJRA1.3); (iv) improving the gCube Resource Model towards more open-ended extensibility solutions needed to model the new internal and external entities, such as EA-CoP applications, Cloud resources, external Data sources, etc. (TJRA1.4).
All the described development tasks will be accomplished by applying an Agile-like software development approach. The internal software development work plan will be reported in three milestones (MJRA1.1.1, MJRA1.2.1, MJRA1.3.1), each of them belonging the corresponding task. These milestones will manifest in multiple deliveries along the project lifetime starting from M3 and M4. The gCube software released by the work package will be documented in DJRA1.1 – iMarine Data e-Infrastructure Enabling-technology Software that it will be updated on a three-month basis. Finally, the Result Model defined within TJRA1.4 will be described in DJRA1.2 – iMarine Data e-Infrastructure Resource Model.
JRA2 – Data Management Facilities Development
The work package focuses on the data management area, in particular on the process of managing the datasets available to the EA-CoP, including services for managing statistical data, marine biology data, environmental data such as satellite data and sensor data; taxonomies and code-lists, etc. Similarly to the other JRA WPs, this work package will receive input from NA3.
The work package will grant access to heterogeneous data repository systems and datasets with common and standard protocol(s) and harmonise the access to different document models they may expose. Other development activities will be dedicate to document access, data access, and file storage. The final delivered solutions will take place in the context of gCube’s Content Management Architecture (CMA), the data management solution actually available in the gCube system.
The main objectives of the work to be undertaken in this work package are to: (i) harmonise the access to as wide as possible variety of models and protocols by adapting the gCube’s inner type for document access, the gCube Document Model (gDM) to the document access types (model and protocols) exposed by the EA-CoP services selected for integration (TJRA2.1); (ii) develop facilities for efficient data transfer over standard transfer/network protocols to ease the exchange of large amount of data within the Data e-infrastructure (TJRA2.2); (iii) provide facilities to assist EA-CoP members in the assessment, harmonization, and certification of data (TJRA2.3).
Each task will manifest in a number of milestone defining the specification of facilities designed and implemented in the context of the task itself. MJRA2.1.1 will report on the facilities designed and implemented as part of task TJRA2.1 for data access and storage; MJRA2.2.1 will report on the facilities designed and implemented as part of task TJRA2.2 for data transfer; and finally, MJRA2.3.1 will report on the facilities designed and implemented as part of task TJRA2.3 for data assessment, harmonization and certification. The first version of each milestone will be available at M4 and all of them will be quarterly updated. The software and documentation of the components released by the work package (namely, Data Management suite) will be described in DJRA2.1 – Data Management Software.
JRA3 – Data Consumption Facilities Development
The objective of work package JRA3 is to provide a full suite of instruments (concepts, specifications and software components) for the consumption of data in the various stages of their evolution (data/information/knowledge) within the e-Infrastructure, often leading to the production of new data/information/knowledge.
The development of JRA3 is based on JRA1 and JRA2 products. The former provides the enabling means of the Data e-Infrastructure and the instrument to carry the computationally intensive tasks of the WP, while Data Transfer, Access and Storage facilities, provided by JRA2, are essential means for managing the exchange and persistence of data sets. Data assessment, harmonization and certification (TJRA2.3) is also common factor in scientific data processing flow.
The tasks of the work package are structured as follows.
Under the work planned in TJRA3.1, the existing facilities for information retrieval of the gCube platform will evolve in three directions: standardization, functionality and performance. Task TJRA3.2 will develop facilities for advanced and large-scale data creation, update and deletion. Task TJRA3.3 will go far beyond strengthening the existing platform, by offering a rich suite of Data Mining tools (algorithms) that will act on data sets handled by the project’s communities, in several domains, knowledge management and statistics being the ones directly identifiable. Task TJRA3.4 will handle two challenges of scientific data handling: their visualization and their exploitation under the simulation perspective. Both generic and data-set specific methods for visualization will be provide, and similarly generic simulation methods and domain specific ones will be supported, driven by requirements and products of other activities (namely TJRA3.3). Finally, TJRA3.5 will bring into gCube Semantic Data Analysis features, in the form of software, that will facilitate the crossing of scientific and administrative domain boundaries in the Data e-Infrastructure both for programmatic and human actors.
JRA3 will periodically deliver its main product, i.e. the software, along with documentation, (DJRA3.2), while it will also provide a formal specification of the Query Language (DJRA3.1a/b). Specifications for all the components will be also timely provided (achieved as milestones).
JRA4 – Data e-Infrastructure Exploitation Facilities Development
Work package JRA4 aims to provide a broad set of specifications and programmatic interfaces for essential groups of the infrastructure services that will facilitate interoperability of iMarine Data e-Infrastructures with external ones and will allow the integration of the entire, or subsets of the, infrastructure it into large data ecosystems.
An essential means for this is the decoupling of the external entities from the complexity of the Data e-Infrastructure, as currently offered in gCube system by Application Support Layer, that becomes the basis of the iMarine developments. Within JRA4 this layer of libraries and APIs, that evolved as need will be promoted into a self contained layer, will be architecturally restructured and functionally extended in order to cover new and extended service areas.
More specifically, TJRA4.1 will tackle the definition of the architecture of the Integration and Interoperability Layer (shortly IIL) in a manner that will allow systematic partitioning of elements, leading to an independent evolution of its vertical (functional area specific) and horizontal (layered dependencies) component groups. The rest of the two tasks, TJRA4.2 and TJRA4.3, will act on top of TJRA4.1, exploiting the specifications and components delivered by JRA2 and JRA3 respectively. They will essential provide the “API of the Infrastructure” for the functional areas of Data Management and Data Consumption, exposing a large set of services, such as Search, Storage, Data Access, to the outer world.
The results of the work package will be provided in the form of specifications (deliverable DJRA4.1) and as software components (included in milestones MJRA4.2.1 and MJRA4.3.1). Additionally the progress of the activity will be reported in deliverable DJRA4.2.
Dostları ilə paylaş: |