Integration and Interoperability Framework Definition
JRA4
PU
M3
DJRA4.2-9
Application Programming Interface Software
JRA4
O
PU
M6, M9, M12, M15, M18, M21, M24, M28
Work package descriptions
Work package number
JRA1
Start date or starting event:
M1
Work package title
iMarine Data e-Infrastructure Enabling-technology Development
Activity Type
RTD
Participant number
1
2
3
4
5
6
7
Participant short name
ERCIM
CNR
NKUA
CERN
E-IIS
US
FORTH
Person-months per participant
16
13
16
14
8
9
10
11
12
13
14
Terradue
Trust-IT
FAO
FIN
UNESCO
CRIA
IRD
Objectives
The main objective of this work package is to enhance the gCube Enabling Services that provide for the operation of the iMarine Data e-Infrastructure. These services will insulate as much as possible the management of the e-infrastructure from the data and the data management services that are hosted in or accessible through the infrastructure itself.
Work package activities will be dedicated to the deployment of the iMarine Data e-Infrastructure; to interface with external infrastructures; to enforce resource usage policies defined by the CoP; to distribute process executions to several computational platforms. Moving towards these targets, the activities of the work package will also reorganize gCube’s Resource Model.
Overall, the work package will assure the correct implementation of the policy defined by the iMarine Board for the governance of the Data e-Infrastructure, and it will support the Service Activity for the deployment of its services with the objective to simplify the periodical upgrades following the software releases and maximize the sharing resources and applications.
Description of work
Work package leader: CNR;
TJRA1.1: iMarine Data e-Infrastructure Management Facilities
Task leader: CNR; Participants: US;
The main objective of this task is to develop new facilities for the management of the iMarine Data e-Infrastructure. These facilities will be dedicated to promote the integration and exploitation of technology external to the e-Infrastructure (e.g. Cloud technology), as well as to support the development activities of the other JRA work packages.
In more detail, the following activities are planned:
extend the Information System services to support policy-based views of available resources;
extend the Resource Management services to facilitate the integration of services and applications already available to the EA-CoP. Support for autonomic deployment, monitoring, and management, which is currently available for native gCube services and service plug-in components, will be extended to external services and applications;
evaluate and introduce a WS-Management implementation (e.g. Wiseman [12]) for services and resource coordination;
extend the gCore the application framework for gCube services, towards programming models and patterns that allow partial and incremental adoption of its abstractions;
TJRA1.2: iMarine Data e-Infrastructure Policy-oriented Security Facilities
Task leader: E-IIS; Participants: CNR;
The main objective of this task is to enhance the current solutions for authentication, authorization, accounting, and auditing to take into account the declaratively-specified policies defined by the CoP. The starting point for this activity will be the Authentication and Authorisation (AA) components developed in gCube and componentised in SOA3, an open source solution for AA federation on multi-domains computing environment developed by E-IIS. In addition it will be integrated the Federated Accounting and Auditing features respectively to ensure available data for billing purposes (enhancing sustainability perspectives) and to push users trust in the infrastructure.
SOA3 will include facilities to encrypt/decrypt data.
In particular, the following activities are planned:
Revise the architecture of the current AA solution of gCube to enhance its reusability;
Made interoperable with possible evolution of Id Management and Policy management solutions offered by EMI, EGI-Inspire or EIF;
Extend with an advanced federated accounting solution, enabling plug-ins of billing components;
Extend with auditing facilities to allow enforcement of security policies.
TJRA1.3: Workflow Management Facilities
Task leader: NKUA; Participants: CNR, US;
The main objective of this task is to enhance gCube facilities for the definition, hosting, and execution of scientific and management workflows, in such a way that the workflows become flexible, reusable, multipurpose resources that provide services of higher complexity to the infrastructure and its users. The vision behind the objective is to complement the pool of basic data management and data consumption services with new members of composite capacity, thus abstracting over platform internals and strengthening the capacities of gCube as a Platform-as-a-Service (PaaS) technology.
In more detail, the following activities are planned for the task:
formalisation of a workflow definition language and implementation of appropriate parsing and translation mechanisms, capable to capture the nature of resources that reside inside and around a gCube infrastructure into complex automation and business process scenarios;
extend workflow technology to satisfy the needs of the workflow definition language and to provide the appropriate abstractions to the execution engine;
enhance gCube’s Process Execution Engine with new features in the area of interoperability, optimisation, failure resilience, workflow logic embedding and resource abstraction, which will strengthen its PaaS cloud-enabling capacities for the data e-Infrastructure (resource discovery, state access, etc);
enhance gCube’s Process Execution Engine to further exploit Cloud integration scenarios (e.g. EC2, EMI outcome) and cloud infrastructures;
develop a gCube service as a dynamic wrapper of persistent workflow resources, so that the latter may be parametrically invoked on a par with other gCube services in further workflow definitions.
TJRA1.4: Resource Model
Task leader: US; Participants: CRN, NKUA;
The gCube’s Resource Model defines what is shared in the data e-Infrastructure, and therefore what can be discovered and used by the software components that operate within it. While resource discovery is functionality in scope of TJRA1.1, this tasks focuses on the model itself, i.e. the resources, their roles and inter-relationships.
In more detail, the following activities are planned:
add open-ended extensibility mechanisms in the Resource Model in cooperation with TJRA1.2 and TJRA1.3. Key areas for extension revolve around the notion of executable resource and data resource;
analyse the models adopted by other data e-Infrastructures such as EGEE/EGI, and Genesi-DR, and evaluate the relevance of the GLUE model [2] to the evolution of the Resource Model;
analyse other external models, including commercially available enterprise solutions.
Deliverables
DJRA1.1-8 iMarine Data Infrastructure Enabling Software, that contains the software and documentation of the components that comprise the e-Infrastructure Management suite (M3, quarterly updated), Type: Other (online software repository)
DJRA1.2 iMarine Data Infrastructure Resource Model, that contains the resource model adopted by the iMarine Data Infrastructure (M20), Type: Other (online document)
Work package number
JRA2
Start date or starting event:
M1
Work package title
Data Management Facilities Development
Activity Type
RTD
Participant number
1
2
3
4
5
6
7
Participant short name
ERCIM
CNR
NKUA
CERN
E-IIS
US
FORTH
Person-months per participant
27
9
19
18
9
8
9
10
11
12
13
14
Terradue
Trust-IT
FAO
FIN
UNESCO
CRIA
IRD
9
Objectives
The main objective of this work package is to integrate and enhance a set of services for managing the datasets available to the EA-CoP, including services for managing statistical data (including but not limited to time series), marine biology data, environmental data such as satellite data and sensor data; taxonomies and code-lists, etc.
In more detail, the following activities are planned:
access heterogeneous data repository systems with common and standard protocols;
generation and manipulation of data;
generation of provenance information and adding this information to the data;
quality assessment;
harmonisation of metadata to ensure interoperability within and across e-Infrastructures;
certification;
publication to make data available and promote the sharing through the standards recognised by the EA-CoP and the scientific communities;
efficient, secure, and reliable data transfer.
Description of work
Work package leader: CNR;
TJRA2.1: Data Access and Storage Facilities
Task leader: US; Participants: CNR;
The main objective of this task is to offer facilities for standard and uniform network access to datasets of varying semantics hosted in multiple and heterogeneous repositories, including content management systems, databases, and file storage systems. Several datasets are currently made available to the EA-CoP under a variety of models and through a variety of protocols. This technological heterogeneity raises significant integration barriers to the timely exploitation of data that is key to the implementation of the EA. The activities in this task will pursue interoperability solutions that: (i) do not require immediate changes to local practices for data management and dissemination, but (ii) can gradually induce technological convergence within the EA-CoP towards common standards. For the most part, these solutions will be sought in the context of gCube’s Content Management Architecture (CMA) and in accordance with its hourglass topology of inner access types for integration and outer access types for dissemination. Different activities will be dedicated to document access, data access, and file storage.
In more detail, the following activities are planned:
Adapt gCube’s inner type for document access, the gCube Document Model (gDM), to the document access types (model and protocols) exposed by the EA-CoP services selected for integration (e.g. Darwin Core, ABCD, Obis Schema, CSGDM, GCMD). The activity will result in plug-ins for key CMA services, such as the Content Manager service and the View Manager service;
Adapt the gDM to the document access types of the EA-CoP services selected for integration. This “backwards” translation will result in new CMA services which will provide access to the totality of the content integrated by the previous activity under the access type preferred for upstream processing;
Define new inner types for document access in term of canonical models of gDoc tree content and in correspondence with the data access types of EA-CoP services selected for integration (e.g. SDMX, various ontology and taxonomy representations standards). This activity will result in schema definitions and the integration or development of clients libraries.
Define bilateral adaptations between the inner types for data access defined in the previous activity and the data access types of EA-CoP services selected for integration. As for activities related to document management, this activity will result in plugins for existing CMA services and in new CMA services.
Develop a CMA service for seamless storage of files across a variety of remote storage systems (e.g. HDFS, SRM, SRB). The service will have a standard, POSIX-based interface supportive of storage management policies (quotas, reservations, access rights).
Extend existing and new key CMA services, plugins, client libraries, document models, and data models with support for recording provenance-related process documentation
TJRA2.2: Data Transfer Facilities
Task leader: CERN; Participants: NKUA, Terradue;
The main objective of this task is to integrate and enhance a set of facilities for reliable data transfer mechanism between the nodes of the Data e-Infrastructure. The EA-CoP manages a large set of multi-type datasets distributed across different repositories. To promote an efficient and optimized consumption of these data resources, the infrastructure must provide facilities for data transfer across nodes. This task will provide a secure, reliable, and efficient solution for other deployed services to move different data types between remote infrastructure nodes under different transfer protocols (e.g. srm-copy, gridFTP, HTTPS, BitTorrent, OPeNDAP, WCS, WMS, WFS, etc) and the combination/optimization of state-of-the-art technologies (i.e. high-bandwidth networks, peer-to-peer). This task will also work on a data transfer mechanism to pass data by reference between infrastructure services by relying on a list of records that are part of a specific record set. The data transfers facilities will support multiple transfer requests, minimize network load, not cause storage overload, support prioritisation , manage transfer shares at service and user level, and allow data transfer parameterization.
In more detail, the following activities are planned:
Develop a gCube service to integrate existing services and technologies to deliver an efficient, secure, and reliable data transfer commodity;
Support the transfer of multiple data formats over different transfer protocols;
Support the transfer of data by reference between infrastructure services by exploiting record sets;
Support advance data transfer functionality to manage transfer priorities, transfer requests, transfer shares, and system overload.
TJRA2.3: Data Assessment, Harmonization and Certification Facilities
Task leader: CNR; Participants: FORTH;
The main objective of this task is to provide facilities to assist EA-CoP members in the assessment, harmonization, and certification of data. Several services will be integrated and customized to cover the data types of relevance to the EA-CoP. The harmonization of time-series data, for example, requires an enhanced version of the corresponding gCube service that exploits reference data, code lists, and taxonomies to identify syntactic errors, missing information, and erroneous observations.
The following approach will be adopted:
Harmonization and quality assessment. This phase deals with heterogeneous data authentication, measurement, and merging. Time series incoming from several participants or agencies, for example, have to be associated to reference structures; tabular data and any other semi-structured data have to be assessed by using taxonomies, vocabularies, and ontologies accessible in the Data e-Infrastructure. The quality assessment phase includes even a coherence control as well as a certification facility on user’s data. This is related to the so called “curation” phase of the analysis for which a serious of software components will be integrated, implemented and released to form a framework of components reusable in tailored workflows.
Semi-automatic supervision of data. A lexical similarity algorithm will be introduced for syntactic correction of entries in scientific data, for linking those data with reference structures, to support automatic merging procedures. Minimum edit distance algorithm properly trained by users to weight the data against reference structures will support this activity.
Certification and data verification. A verification phase will be provided to the user, in order to certify the overall coherence of the supervised data. A manual or automatic certification of data structure coherence will be supplied, with the possibility to manually supervise treated scientific data. Automatic data check algorithms will be supplied to check if the data type and transcription is coherent with the reference structure. Errors will be calculated and put in evidence to let users correct them.
Provenance information will be always generated and attached to the harmonised data objects. Relevant standards for data provenance description will be considered and enhanced to cope with the data richness and diversity the iMarine Data e-Infrastructure will deal with.
Deliverables
DJRA2.1-9 iMarine Data Management Software, that contains the software and documentation of the components that comprise the Data Management suite (M4, M6, M9, M12, M15, M18, M21, M24, M26), Type: Other (online software repository)
Work package number
JRA3
Start date or starting event:
M1
Work package title
Data Consumption Facilities Development
Activity Type
RTD
Participant number
1
2
3
4
5
6
7
Participant short name
ERCIM
CNR
NKUA
CERN
E-IIS
US
FORTH
Person-months per participant
18
27
22
8
9
10
11
12
13
14
Terradue
Trust-IT
FAO
FIN
UNESCO
CRIA
IRD
18
22
Objectives
The main objective of this work package is to develop a set of facilities for supporting the data processing tasks the EA-CoP faces with. These facilities include services for:
data discovery;
generation and manipulation of data;
mining and extraction of knowledge from raw data;
generation of provenance information and the link of this information to the data;
data transformation.
Description of work
Work package leader: NKUA;
TJRA3.1: Data Retrieval Facilities
Task leader: NKUA; Participants: FORTH, Terradue;
The main objective of this task is to provide advanced data discovery, building on and extending the Information Retrieval constructs of gCube. In particular the following activities will be carried out:
formalisation of a Data Discovery language (i.e. Query Language) that will embrace all the capacities of the infrastructure, including filtering, projection, semantic matching, feature matching, geospatial/temporal matching and coverage of more data types that will emerge during the project’s lifetime;
implementation of support of Information Retrieval over new datatypes to emerge;
definition of interfaces for semantic information retrieval and integration of gCube Search Service with the mechanisms that will be offered by the enhanced, fully fledged ontology management toolkit provided by task TJRA3.4);
strengthening of the geospatial/temporal IR capabilities of gCube with the inclusion in IR path of existing OGC compliant services;
provision of the federated Information Retrieval capacities for non-cooperative textual sources, and investigation of support for non-textual ones;
enhancement of the performance and robustness capacities of the gCube information retrieval mechanisms;
evolution of all existing gCube IR constructs (Indexing, Search, Personalisation) under new functional and non-functional demands;
TJRA3.2: Data Manipulation Facilities
Task leader: CNR; Participants: NKUA, FAO;
The main objective of this task is to provide the facilities for advanced and large-scale data creation, update and deletion building on and extending gCube. In particular the following activities will be carried out:
specification of a Data Manipulation Framework that will leverage the benefits of the infrastructure, including updating, deleting and creation of data in existing large and distributed data sets, such as tabular data;
Extension of the powerful open framework that allows to build chains of transformations. This framework will be extended to deal with tabular data, time series, and large scientific data sets;
Support for the definition of Data Manipulation access control;
Definition of integration of restricted access (private) data sets with public access data sets;
Definition of aggregation facilities over spatial, temporal and other dimensions, also to safeguard privacy and confidentiality issues;
Provision of efficient extraction facilities of data from large data sets;
Enhancement of the performance and integration capacities of the gCube information mechanisms with CoP developed adapters, or other access mechanisms (E.g. for SDMX generation/manipulation, the storage layer may not be SDMX aware, and could interact using an adapter).
TJRA3.3: Data Mining Facilities
Task leader: CNR; Participants: Terradue, FAO;
The main objective of this task is to integrate services and libraries for data mining in order to enforce the policies identified by the EA-CoP and to deal with data type heterogeneity. The premise is that the scientific data to analyze have been treated and that their coherence has been certified.
Data mining functionalities principally aim to find similarities among data or to extract “hidden” properties like periodic trends. The outcoming models can be used even to extract frequent patterns from data or they can run in generative mode, in order to perform predictions on the behaviour in the future for some properties.
Data mining techniques will be packed in a library which other services will integrate to deliver advanced services, or the user can interrogate from a graphical interface.
The following subsections will describe which the data mining techniques will be employed:
Data Clustering on the basis of description. Incoming scientific data could be viewed as documents, identified by the description of the reported attributes, linked to the contents represented by the data. Clustering algorithms, such as k-means algorithm, can be used to calculate similarities among tabular data or their entries.
Bibliometric index calculation in textual time series descriptions. Similarly to data clustering, scientific data can be seen as textual documents having an overall as well as single record descriptions. Bibliometric index on such data can be calculated for selecting and retrieving their essential concepts: features based on layout, document structure, and topic concepts are used to discriminate between related and unrelated pages; an overall page-set similarity measure is used for clustering disjoint groups of pages; automatic indexing techniques calculate correlations spanning multiple pages, and use these correlations as feature space elements.
Periodicity, seasonality and temporal cycles extraction from time series data. Time series can hide hidden events, related, for example, to species behaviour, migrations, or capture seasons during a single year. Such periodic phenomena can be automatically put in evidence by machine learning techniques for automatic time cycles and periodicity extraction. Many techniques will be offered including the most common Singular Spectrum Analysis.
Anomaly pattern detection. Anomaly patterns represent dynamics, hidden into a certain observation sequence, which indicate an uncommon behaviour or the presence of non standard events presence in scientific data trends. The main aim of these techniques is to detect any subset of data which displays a significant increase in anomalous activity as compared to the “normal” behaviour of the reference data. The used model will be trained on reference data in order to set up a normal behaviour.
Computing Association Rules between Tabular Data. Association rules extraction techniques try to put in evidence correlation between data, attributes or entire sequences of observations. Most common techniques will be offered as modular libraries such as the association rules building from Frequent Patterns.
TJRA3.4: Data Visualisation and Simulation Facilities
Task leader: NKUA; Participants: CNR;
The main objective of this task is to integrate and enhance services for data set visualisation, and simulation in order to enforce the policies identified by the EA-CoP and to deal with the data type heterogeneity. In the context of the project both generic and task-specific simulation and visualisation solutions are foreseen, depending on the needs and requirements raised.
With respect to visualisation, the task shall cover:
provision of visualisation of the content area with tools that will capture the semantic and structural relationships among documents deposited in Content Management or retrieved via Information Retrieval ;
enhancement of generic document visualisation to cover new data types, integrated in the web front-end of the system;
support of scientific-domain specific visualisation, on special or generic data types (such as time series and geospatial and temporal data) as needed;
With respect to simulation, it is meant to use algorithms delivered as libraries in section TJRA3.2 as well as other techniques in order to perform prevision on trends or to catch evidence of data accumulating in some direction. Some time series attributes could tend, for example, to go towards an accumulation value which could represent stable or an unstable situation. The system will even be able to guess the nature of such a point, basing on a data history. The following techniques will be offered:
Data series analysis and forecast of trends. Data trends plotting allows human experts to perform better reasoning on the future behaviour of some dynamics. Singular Spectrum Analysis and other techniques such as Monte Carlo methods will be involved. The latter in particular tends to be used when it is unfeasible or impossible to compute an exact result with a deterministic algorithm.
Statistical methods for data distribution calculation. Statistical functions will be provided for classic statistical analysis. Plotting methods for data distribution visualization will be provided along with the possibility to calculate the expected values (mean, variance etc.) for a certain numeric attribute according to various classic statistical distribution.
Aggregated data visualization. Statistical analysis can be performed on a single time series or on several series. An aggregated data plot can be produced, in order to graphically analyze data frequencies and accumulation as well as distribution of information along families and attributes. For time series data, a number of different representation formats can be used (Small Multiples, Time-Series Plots, Static State Replacement, Theme Rivers, etc) while visualization can generate additional knowledge not directly evident from the data sets ( Periodic Pattern Identification, Spectrum analysis, etc). Moreover, a big part of the information space visualization can be reduced to ontology visualization techniques (Raised Surface designs, Information Landscapes, Graph visualization, etc).
TJRA3.5: Semantic Data Analysis Facilities
Task leader: FORTH; Participants: FAO;
Semantic data analysis for the EA-CoP aims to provide a semantic based infrastructure for marine living resources services integration. This task aims to deliver a set of libraries and services to bridge the gap between communities and link distributed data across community boundaries. In addition, it aims to provide the mechanisms to support users in composing analytic ‘services’ by selecting knowledge from multiple sources and connecting these locally to generate new interpretations and to align them across different communities.
For semantic data analysis to be effective, the iMarine Data e-Infrastructurewill connect distributed data to consume knowledge across systems and community boundaries using web services. The introduction of the semantic Web and the publication of expressive metadata in a shared knowledge framework enable the deployment of services that can intelligently use Web resources. Moreover, the iMarine Data e-Infrastructure will integrate many different service registries into a virtually unified "repository" that can serve knowledge based querying facilities.
Deliverables
DJRA3.1-2 gCube Query Language Specification, containing the formal specification (in its first version in DJRA3.1 and in its final version in DJRA3.2) of the gCube QL (M6, M24) – Type: Report
DJRA3.3-9 iMarine Data Consumption Software, that contains the software and documentation of the components that comprise the Data Consumption suite (M6, M9, M12, M15, M18, M21, M24), Type: Other (online software repository)
Work package number
JRA4
Start date or starting event:
M1
Work package title
Data e-Infrastructures Integration and Interoperability Facilities Development
Activity Type
RTD
Participant number
1
2
3
4
5
6
7
Participant short name
ERCIM
CNR
NKUA
CERN
E-IIS
US
FORTH
Person-months per participant
3
7
5
5
6
6
8
9
10
11
12
13
14
Terradue
Trust-IT
FAO
FIN
UNESCO
CRIA
IRD
4
3
Objectives
The main objective of this work package is to develop a set of facilities for supporting the exploitation of the rest of available and emerging facilities in the Data e-Infrastructure by other neighbouring and external ones. These facilities will manifest sets of Application Programming Interfaces (APIs), policies and of Interoperability cases. Planned activities target the establishment of:
definition of the integration and interoperability frameworks of gCube;
identification and implementation of standards at the boundaries of services and the entire infrastructure for interactions with its elements or its entity;
provision of programmatic APIs that enable integration and interoperability, for and beyond the specifications adopted;
The main objective of this task is to define the general rules governing the production of the APIs for all the functional categories listed in the remaining WP tasks.
In particular, the following activities are planned:
definition of the principles and policies of Integration and Interoperability, identifying both the roadmap and the primary specifications (protocols) to be covered by the various facilities and tasks of the WP;
definition of a formal the architectural view of Integration and Interoperability Layer, as an evolution of the current Application Service Layer;
definition and implementation of the core enabling elements of the Integration and Interoperability Layer, as an evolution of the current Application Service Layer;
The main objective of this task is to define and provide the formal APIs for resources that fall under the functional category of “Data Consumption” according to the methodology defined by TJRA4.1.
More specifically the task will provide multi-protocol APIs (e.g. Java, REST, SOAP, depending on the need and relevance) and related implementation components for the easy consumption for a number of facilities that fall in the Data Access layer, such as:
The main objective of this task is to define and provide the formal APIs for resources that fall under the functional category of “Data Consumption” according to the methodology defined by TJRA4.1.
More specifically the task will provide multi-protocol APIs (e.g. Java, REST, SOAP, depending on the need and relevance) and related implementation components for the easy consumption for a number of facilities that fall in the Data Access layer, such as:
Search service and all the related operators
Indexing services
Data transformation services
Geospatial/temporal retrieval services that deal with both data retrieval and rendering
Semantic data management services
Data visualisation services
Data mining services
Simulation services
Deliverables
DJRA4.1 Integration and Interoperability Framework Definition, containing the methodology for the achievement of the objectives of the task and primary selected technologies and specifications for adoption (M3) – Type: Other (On-Line Document)
DJRA4.2-9 Integration and Interoperability Framework Design and Implementation Report, containing the architecture of Integration and Interoperability and the design of the components and services that comprise it (M6, M9, M12, M15, M18, M21, M24, M28) Type: Other (On-Line Document), regularly updated
The specification of the Data e-Infrastructure Management facilities are published in dedicated wiki pages.
MJRA1.6-10
Data e-Infrastructure Policy-oriented Security Facilities Specification
JRA1
M3, M9, M15, M21, M24
The specification of the facilities for AA and security related aspects are published in dedicated wiki pages.
MJRA1.11-15
Workflow Management Facilities Specification
JRA1
M4, M10, M16, M22, M25
The specification of the facilities for workflow management are published in dedicated wiki pages.
MJRA2.1-6
Data Access and Storage Facilities Specification
JRA2
M4, M6, M12, M18, M24, 26
The specification of the facilities for data access and storage are published in dedicated wiki pages.
MJRA2.7-11
Data Transfer Facilities Specification
JRA2
M6, M12, M18, M24, 27
The specification of the facilities for data transfer are published in dedicated wiki pages.
MJRA2.12-16
Data Assessment, Harmonization, and Certification Facilities Specification
JRA2
M6, M12, M18, M24, 27
The specification of the facilities for data assessment, harmonization and certification are published in dedicated wiki pages.
MJRA3.1-6
Data Retrieval Facilities Specification
JRA3
M3, M6, M12, M18, M24, 26
The specification of the facilities for data retrieval are published in dedicated wiki pages.
MJRA3.7-11
Data Manipulation Facilities Specification
JRA3
M6, M12, M18, M24, 27
The specification of the facilities for data management are published in dedicated wiki pages.
MJRA3.12-16
Data Mining Facilities Specification
JRA3
M6, M12, M18, M24, 26
The specification of the facilities for data mining are published in dedicated wiki pages.
MJRA3.17-21
Data Visualisation Facilities Specification
JRA3
M6, M12, M18, M24, 26
The specification of the facilities for data assessment, harmonization and certification are published in dedicated wiki pages.
MJRA3.22-26
Data Simulation Facilities Specification
JRA3
M6, M12, M18, M24, 26
The specification of the facilities for data simulation are published in dedicated wiki pages.
MJRA3.27-31
Semantic Data Analysis
Facilities Specification
JRA3
M6, M12, M18, M24, 26
The specification of the facilities for semantic based data analysis are published in dedicated wiki pages.
MJRA4.1-5
Data Management APIs Specification
JRA4
M6, M12, M18, M24, 28
The specification of the APIs for data management are published in dedicated wiki pages.
MJRA4.6-10
Data Consumption APIs Specification
JRA4
M6, M12, M18, M24, 28
The specification of the APIs for data consumption are published in dedicated wiki pages.
Pert diagram
The diagram below depicts the main relationships between the various tasks of the Joint Research Activities. In particular, it presents how the tasks of three out of four JRA work packages, i.e. JRA1, JRA2 and JRA3, proceed in parallel to satisfy the requirements and approaches identified by the EA CoP in the context of the NA3 activities. The forth work package, i.e. JRA4, will abstract over the pool of services and facilities developed by the three work packages for data infrastructure operation, data management and data consumption with the goal to produce application programming interfaces aiming at simplifying the development application and services benefitting from these facilities. The set of software artefacts produced by JRA tasks is then passed to SA3 that takes care of integrating and testing them (TSA2.1), and documenting and packaging for distribution (TSA3.2). The software artefacts will be deployed in production (TSA1.1) as well as exploited to develop and operate Virtual Research Environments (SA2). From this activity, the JRA tasks will receive feedback on the effectiveness of the existing software packages and requests for further enhancements in a continuous interaction. In addition to that, further requirements and feedback will stem in the context of NA3 resulting from the availability of different versions of the iMarine data infrastructure.
Figure . Joint Research Activities Pert Diagram
Risk Analysis and Contingency Plans
A risk breakdown structure for the JRA activities is presented in the following table.
Table . Joint Research Activities Risk Analysis and Contingency Plan
Risk
Evaluation and Description11
Contingency Plans
Foundation technology becomes obsolete
Internal; High; Medium impact
The gCube system is build on technologies released a few years ago. It may be needed to change them to maintain a state-of-the-art status of gCube.
The gCube services do not deal directly with these underlying facilities: a development framework (gCore) was implemented to isolate the services from the underlying layers. This tiny framework will properly evolve to minimize the impact of the risk over the services.
Software is not released on time
Internal; Medium; High impact
This risk is very common in any project with a consistent plan of development activities. Instances of the risk highly affect all the other work packages’ activity.
The Agile development approach adopted in JRA will provide many opportunities to assess the direction of the project through incremental and iterative work cadences and short integration cycle.
Appropriate boards within the project will continuously monitor clues of this risk and take corrective actions.
Community of Practices cannot be implemented
Internal; Low; Medium impact
NA3 translates the CoP in development goals that cannot be achieved by JRA for any reason.
Representatives of JRA will be included in the NA3 work package by assuring the feasibility of the goals according to JRA requirements and effort.
EMI fails in its goals or to deliver software suitable for the project’s purposes
External; Low, Low impact
The European Middleware Initiative may or may not fail to maintain the gLite and ARGUS software currently exploited by the gCube system. Due to the past experience of the participants, the probability of the risk is very low.
There are very few and well-identified points of contact between the gCube system and the gLite software. They can be changed with a low impact to interface other systems offering computing and storage capabilities.
In any case, Hadoop clusters will be available to compensate.
ARGUS, which is an authorization framework, can be eventually disabled and carefully replaced with a similar technology in the context of TJRA1.2.