The objectives of the project are best understood in relation to the political and technological status quo in the target domain. This Section gives a selective overview of ongoing initiatives and available services with a view to showing the advances that the project will bring about.
Section summarises recent activities towards the implementation of the EA, illustrates the core challenges that limit overall progress, and advocates the importance of large-scale infrastructural support in meeting those challenges. It then outlines high-level requirements for an effective infrastructure, linking the success of the project as well as one its core outcomes to, respectively, the involvement and promotion of the emerging EA-CoP.
Section focuses on the loose infrastructure of information sources and services that are currently available to the emerging EA-CoP, illustrates its actual strengths and inherent weaknesses, and relates the paradigmatic shift in infrastructure building pursued by the project to the delivery of data integration and processing foundations for the implementation of the EA.
Section motivates the technological and deployment choices that the project makes for its target data e-Infrastructure and discusses their implications for the EA-CoP and the development of data e-Infrastructures at large.
The Ecosystem Approach and the Emerging EA-CoP
The EA is formally understood as an extension of conventional fishery management, encompassing new features to deal with a more comprehensive mandate in relation to the ecosystem. The same holds true for resource conservation, as the “old” concept of emblematic or vulnerable species protection (e.g. through area protection) is extended to cover the maintenance of ecosystem biodiversity and the sustainable use of its goods and services. The conventional planning and management cycles are common to fisheries management and conservation, and both are required to use the best scientific information available to inform decision-making processes.
During the last decade, the EA implementation has been progressing through a number of activities, including: drafting of conceptual and operational guidelines and plans (by FAO, WWF, CBD, etc.); developing ecosystem models and better understanding of ecosystemic impacts of resource uses; developing catalogues of ecosystem indicators; promoting the application of risk analysis, management and communication; testing of marine protected areas; reducing fisheries impacts; developing inter-agency collaborations, e.g. between FAO, CITES, UNEP and the CBD; developing ecolabelling; building capacity through dedicated field projects. Research has evolved significantly, increasing focus on collateral impacts of ecosystem use, ecosystem protection devices, food chains, resilience, valuation of ecosystem goods and services, and more generally on the interactions between the human and natural sub-systems of the ecosystem. These activities have generated a range of small “clusters of interest” (e.g. formal or semi-formal working groups and email groups) around EA-related subjects (sea mounts, coral reefs, high seas, marine protected areas)1 or methodologies (e.g. modelling, mapping, atlases).
Despite these results, the EA implementation is still rather slow and will not be able to meet the deadlines adopted by the World Summit on Sustainable Development for 2012. The overall activity remains fragmented and chaotic, and an EA-CoP is only slowing emerging; the clusters function asynchronously and in a fashion that is uncoordinated at best, resulting in slower than possible progress, incoherence and a large gap between endowed and low-capacity areas.
Three factors among others are slowing down or complicating the implementation:
-
the greater amount of data needed to deal with the larger range of issues in scope, within tighter timeframes and respecting higher quality standards. In line with the precautionary approach, the use of risk analysis is proposed as a way to cope with the resulting growing uncertainty. In addition, where capacity does exist, ecosystem modelling can be used to develop and compare management scenarios;
-
the difficulty to predict with accuracy the outcome of management measures in complex social-ecological systems. In line with “good governance” principles, an adaptive management process is proposed as a way to proceed with the information available and adapt the management set-up as new knowledge is obtained. In the context of EAF, a monitoring capacity based on the ability to follow the trends of “live indicators” as a basis for corrective actions would provide a response. Also the way to limit costly errors in this trial-and-error process is to use tested best practices and exchange information, as rapidly as possible, on successes and failures. The need to work on as many comparable ecosystems as possible implies establishing such exchange at regional and global levels;
-
the insufficient research and management capacity available, not only in the developing world, to deal with the large range of fisheries and ecosystem types and jurisdictions. This calls for simplified access to a large range of information resources and analytical methods, and for relief from the burden of maintaining complex systems.
It is widely agreed that tackling these challenges requires the support of a large-scale, distributed infrastructure of information resources, from data sources to information services, which can address three key requirements for science-based decision-making: (i) access to basic or reference data; (ii) availability of tools for data processing; and (iii) diffusion of results beyond the strict decision and publication processes. In particular, the expectation is for an infrastructure that can: facilitate the federation, access, validation, and processing of data and information at ecosystem level at the required scale; accelerate the exchange of information, tools, experiences, and best practices; compensate the reduced capacity available in some areas/fisheries through better information exchange and catalogues of best practices between more and less endowed areas.
As it sets out to build and operate a data infrastructure that meets these requirements, the project acknowledges that its success rests on the involvement of existing clusters of interest around the EA, hence its reliance on information system specialists for governance (the Board) and on representatives from interest groups for validation and promotion within the broader community (the Advisory Council). Conversely, as the result of building and operating an infrastructure that meets the requirements above, the project will provide an opportunity and a strong incentive for existing clusters of interest to rally around concrete data and processing foundations, to exert influence on their governance, and to form in the process a more functional, multidisciplinary, and influential nucleus out of which a European and a global EA-CoP may organically grow.
In particular, the project will assist in organizing for the EA-CoP a wide range of data and information in interoperable databases and knowledge bases, ontologies, glossaries, digital libraries (with as free access as possible) and other repositories. It will provide facilities to help organize the assessment-and-decision process, including e-meeting facilities, wikis, and e-training, thereby improving over the present image of a fragmented, chaotic, and poorly accessible collection of information resources. It will offer simultaneous access to many data sources, improving over the present use of conventional search engines. It will enhance the role of portals dedicated to the EA, promoting the elaboration of coherent fisheries and conservation policies and strategies. It will also facilitate the participation of sector representatives or actors to a more public and transparent debate. Last but not least, the project will promote scientific collaboration across disciplines and ecosystems, pooling the rare competencies available at regional level and worldwide.
Overall, the project will provide a forum where key representatives of the diverse communities that support the EA will develop policies for sharing data, applications, and hardware resources within a single infrastructure. In doing so, the project will also provide a kernel and a catalyst to the emergence of a unified EA-CoP, and possibly the foundations of its information and knowledge exchange partnership component.
Infrastructures for the Emerging EA-CoP
Today, the EA can leverage network access to a substantial amount of relevant data sources and information services.
Some of the traditional data types are relatively well represented online, most noticeably those that concern the classification, naming, description, spatiotemporal distribution, and natural environment of marine species. Popular online services give access to a large body of biodiversity data (cf. OBIS, Fishbase, SeaLifeBase, FIGIS), others publish bathymetric data (e.g. Virtual Ocean, GEBCO, ETOPO), yet others disseminate oceanographic and atmospheric data (such as ICOADS, US NODC with WorldOcean Atlas and World Ocean Database); some services offer interactive interfaces to mapping tools (e.g. IMAPS). In some case, the services federate data collated at different scales or around different themes, moving data or queries under standard protocols and formats. Often they participate to broader federations in turn, feeding their data to global services further upstream (e.g. GBIF, Encyclopedia of Life). Collectively, these services form the architecture of an infrastructure of data providers and data publishers that align on metadata standards (e.g. CSGDM, GCMD, OceanPortal), service discovery and query protocols (e.g. DiGIR, TAPIR), data exchange formats (e.g. ABCD, Darwin Core, OBIS Schema), and data dissemination protocols (e.g. WMS, WFS).
A large number of ongoing projects and initiatives contribute to the constant expansion of the infrastructure in terms of coverage, services, tools, and data exchanges between services. Species 2000, Species2000 Europa and the Catalogue of Life – a collaboration between Species 2000 and the Integrated Taxonomic Information System (ITIS) – are building the ultimate register of taxonomic names, including classification and synonymy. i4Life is an initiative to create tools to automate the integration of species lists from different sources. The Pan-European Species directories Infrastructure (PESI) provides standardised and authoritative taxonomic information by integrating and securing Europe’s taxonomically authoritative species name registers and nomenclators (name databases) and the associated expert(ise) networks that underpin the management of biodiversity in Europe. The World Register of Marine Species (WoRMS) was developed to support the data management needs of the MarBEF Network of Excellence and built on the information collated by another European Project, the European Register of Marine Species. Aquamaps builds on OBIS and GBIF data, to create range maps of aquatic species by extrapolating known occurrences to an area with suitable environmental conditions.
For other types of data, online accessibility, aggregation and exploitation are considerably more limited. To varying degrees, this is true of open-access bibliographic data (but see ASFA, Aquatic Commons, OceanDocs), expertise records (but see OceanExpert) and - most crucially to the vision of the EA - of statistical data, assessment data, and socio-economic data, where the main problems are less to do with coverage and consistency than with timeliness of publication, systemic interoperability, provenance recording and overall data quality across consolidation chains. The premises for infrastructural support towards collation and exchange of statistical data do already exist, from data sources available at local, regional, and global levels – such as FIGIS, ISTAM, and FIRMS – to widely endorsed standards for data exchange, publication, discovery, and notification, most noticeably the Statistical Data and Metadata Standards (SDMX). Infrastructural developments in this direction are still in their infancy, however, and there are pressing issues related to streamlining the adoption of standards and the reuse of standards-based service implementations and tools – from registry implementations to data conversion, ingestion and presentation tools – across both providers and consumers of statistical data.
Even where the infrastructure is most developed, however, the bottom-up and uncoordinated approach that characterises its deployment curtails the possibilities for cost-effective, transparent, and innovative exploitation. Its services are portals, are typically designed to disseminate a particular type of information, and share data opportunistically on the basis of metadata and protocol standards. This leaves little room for true back-end services with general-purpose and reusable data management functions, including storage, description, annotation, provenance, mediation, indexing, retrieval, transfer, and transformation. Similarly, it dispenses altogether with standard middleware services, including: (i) resource management services, which are key to optimised, transparent, and cost-effective use of the available resources; (ii) resource publication, notification, and discovery services, which are key to defining applications that make dynamic use of the available resources; (iii) process and workflow execution services, which are key to synthesizing applications out of existing functionality; and (iv) security services, which are key to bringing sensitive data (e.g. Abundance Assessment or Control and Surveillance data) within the infrastructure, where it may be aggregated, qualitatively analysed, or otherwise “anonymized” towards the synthesis of Open Access artefacts.
Thus the primary mode of data access is currently interactive and there is little support for building applications and workflows that integrate, cross-reference, post-process, analyse and more generally synthesise new knowledge from the available information. Without such applications and workflows, it is unclear how to achieve the automation, transparency, and timeliness that are required to turn the infrastructure into a credible platform for support to policy development and decision-making. It is equally unclear how goals of cost-effectiveness, sustainability, performance, and scale may be successfully addressed if the scope for resource sharing within the infrastructure is limited to data and excludes applications, services, computing resources, and human resources dedicated to the governance and administration of the infrastructure.
These observations motivate a more integrated approach to infrastructure building, one in which the infrastructure is conceived as a set of dedicated hardware resources, cross-application software services, and domain-specific data sources which are made available: (i) to a broad class of stakeholders, (ii) under the strict governance of a set of policies and the routine administration of dedicated human resources, and (iii) for the implementation of a class of related processes. This is an e-Infrastructure and the project will deploy and operate one whose policies and services are strongly oriented towards governance and support of data management processes, i.e. a data e-Infrastructure. In particular, the project will integrate the strengths of the existing infrastructure – data sources, information services, metadata and protocol standards, sharing policies – into a data e-Infrastructure of shared resources which may serve as a suitable platform for the implementation of the EA.
Enabling Software Technologies
Initiatives for the deployment of general-purpose e-Infrastructures are well underway across the globe, and so have been for a number for years. Long-term national and international deployment efforts may be found in Europe (GÉANT, EGEE/EGI, DEISA, NGS, D-Grid, NDGF), the United States (TeraGrid, OSG), China (CROWN), Japan (NAREGI), India (Garuda), Australia (APAC), and the countries of the Pacific Rim (PRAGMA).
Efforts may differ pair-wise in terms of available resources, scope of resource distribution, resource topology, and application domain. Yet they are all built on the common assumption that the infrastructure pools its resources from a number of diverse and autonomous administrative domains, in accordance to the principles of controlled resource sharing that characterize Grid systems [7]. From Grid computing, most infrastructures inherit an orientation towards computation-intensive and data-intensive batch processes, i.e. tasks, such as those required in modern, large-scale scientific research. Data management is then concerned with federation and optimal placement of large data sets, and this reflects into services for distribute storage, transfer, and replication of data.
A notable exception is D4Science, an e-Infrastructure that adopts the federative model of Grid systems but differs substantially from such system in terms of service offering. In particular, D4Science relies on the services of gCube, an open-source distributed system specifically designed to operate of data e-Infrastructures with the following key properties:
-
a strong orientation towards service-oriented applications. Applications in the infrastructure take the form of Virtual Research Environments, i.e. dynamically and interactively defined aggregations of data collections and services with interfaces towards a variety of actors, from end-users to administrators. Collections and services are drawn from longer-lived resource pools allocated to Virtual Organizations, i.e. virtual domains with dedicated administrative interfaces. Virtual Research Environments and Virtual Organizations define scopes within the infrastructure, and resources may be confined to scopes or selectively shared across scopes.
-
the provision of a rich array of general-purpose data management services, associated libraries, and interactive interfaces ready for inclusion in Virtual Organizations and Virtual Research Environments. This includes services for importing, storing, indexing, accessing, searching, transforming, describing, and annotating data. Collectively, these services support a high-level, application-oriented notion of data infrastructure.
-
the ability to deploy Virtual Organizations and Virtual Research Environments on demand, and the ability to manage the lifetime of their services in an autonomic fashion. There are middleware services that compute the dependency closure of selected services, find the best match between service requirements and hardware capabilities, and deploy accordingly a number of services instances. They then monitor the activities of the deployed instances, distribute their load, and deploy new instances in response to failures or overload. Collectively, these services support a model of resource management that aligns with the service-orientation of the infrastructure and departs from conventional management models found in task-oriented infrastructures.
-
the ability to stage, execute, and monitor declarative specifications of sophisticated workflows, where individual execution steps may entail the invocations of services or the execution of scripts, binaries, and tasks. Most noticeably, the infrastructure can dynamically outsource the execution of individual steps to external infrastructures so as to exploit the task-orientation of their resource management regimes. In this sense, the infrastructure reconciles tasks and services as equally important models of process execution.
-
the ability to adapt to a wide range of community-specific requirements, both in terms of presentation and back-end logic. The infrastructure offers a range of extension and customization mechanisms in addition to, and at lower-cost than, the development of new application services. All pre-defined services and service front-ends can be dynamically configured; where appropriate, services are designed as execution engines of declarative specifications (e.g. for process execution, as discussed above) and some services can be dynamically extended with service plug-ins. Effectively, the infrastructure offers an open development platform to its communities of adoption.
-
the ability to extend its core functions beyond the boundaries of the infrastructure. Key middleware and data management services are built with interoperability mechanisms that allow new functionality to be dynamically plugged within existing services and service frameworks, and which capitalize on relevant standards to maximize reach and exposure outside the infrastructure.
Collectively, these properties identify gCube as a suitable technological substrate for the class of data-oriented processes that characterize the implementation of the EA. Thus its services are selected as primary components for integration within the data e-Infrastructure that the project sets out to build and operate. This selection extends naturally to the context in which gCube is currently deployed. Based on the interoperability mechanisms built within some of its key services, D4Science acts today as the enabling component of a federation of e-Infrastructures that includes, among others, EGEE, DRIVER, GENESIS-DR/DEC, and INSPIRE. Within the federation, resources that originate in individual infrastructures propagate to the others on demand and at a contained cost; this complements the strengths of each infrastructure, broadens its coverage, suggests innovative exploitation, and accelerates adoption. The project will thus leverage the same interoperability mechanisms to join its own data e-Infrastructure to the D4Science federation, thus partaking of its continuous cycle of mutual exchange and benefit. In particular, its data e-Infrastructure will make available to the EA-CoP computational, content, and functional resources that are available in the federation as a whole (e.g. Earth Observation datasets aggregated in GENESIS-DR/DEC or the computational resources pooled in EGEE). Conversely, the data e-Infrastructure will expand the overall capacity of the federation to include the content and functionality that will be directly or indirectly published from within its own data e-Infrastructure.
The implications for the EA-CoP of a gCube-based data e-Infrastructure that operates within a federation of e-Infrastructures are well illustrated by three Virtual Research Environments in D4Science:
-
the Aquamaps Virtual Research Environment generates species distribution maps through sophisticated analyses of data integrated from a variety of relevant sources. In doing so, it mirrors the activity of the popular Aquamaps service, but it relies on the infrastructure to integrate data sources available in the D4Science ecosystem and to outsource the execution of computationally intensive algorithms. As a result, the Virtual Research Environment can feed into the external services more precise, accurate, synthetic and predictive maps than those currently disseminated by the service.
-
The FCPPS Virtual Research Environment (for Fishery and Country Profiles Production System) generates balanced and synthetic reports of the status of fisheries and aquaculture in a given country. The reports support decision-making processes within the sector and promote advocacy in the sustainable use and conservation of aquatic resources. Their generation hinges on complex aggregation and editing of continuously evolving multi-lingual data from heterogeneous sources. The infrastructure makes available additional data sources to the process and reduces dramatically the time required for report generation, to the extent that updates can be computed and disseminated as often as the community requires.
-
The ICIS Virtual Research Environment (for Integrated Capture Information System) integrates regional and global capture and distribution information of aquatic species from Regional Fishery Management Organizations (RFMOs) and international organizations (FAO, WorldFish), and it exposes it to interactive and programming access. ICIS allows for the configuration of algorithms and filters for parameters such as area, species distribution and habitat, providing a harmonized view of catch statistics and allowing the community to overlay according to pre-defined reallocation rules. To ensure broad interoperability, ICIS makes use of existing international standards including those agreed at the Coordinating Working Party (CWP) on Fishery Statistics and the Open Geospatial Consortium's ISO 9115.
While these Virtual Research Environments serve as convincing proofs of concept – and are rapidly evolving into production-class tools for the emerging EA-CoP – much development and integration work is required to extend and adapt the service-based framework provided by gCube towards the data types, policies, and processes that are key to the implementation of the EA. The precise nature of this work is outlined in Section , but it is clear that the information services mentioned in Section – and the open-source implementations already available within the EA-CoP for the protocols, data standards, metadata standards and profiles that pertain to the operation of those services – become key candidates for integration within the data e-Infrastructure. This is the case, for example, of FAO’s OpenSDMX implementations of SDMX standards for modeling, accessing, and discovering statistical data; in this sense, the data e-Infrastructure will be able to interface external SDMX infrastructures and will act as an SDMX-based infrastructure in its own right. This is also the case for existing implementations in the domain of geo-spatial data modeling, accessing, searching and publication (e.g. the GeoNetwork catalog and its map generation and visualization services) and for the management, dissemination, and consumption of structured data in “semantic” forms, such as RDF/N3, OWL, and Linked Data (e.g. the Triplify plugin for web applications and the DR2 server, the Virtuoso semantic store, the Silk Linking Framework, the Sparallax browser of Sparkle-compliant search servers, again GeoNetwork’s Ontology service, and others).
In summary, by integrating gCube with services and components already available within the EA-CoP, the project will make advances in both problem domain and solution domain. As far the first is concerned, the project will develop and integrate technological solutions – services and applications – which are essential to the implementation of the EA in the context of its data e-Infrastructure. As far as the second is concerned, the project will consolidate and enhance the capabilities of gCube as an enabler of general-purpose data e-Infrastructures. In turn, this will increase the returns on investment in building the e-Infrastructures that make or will make use of its services, directly (as for D4Science) or indirectly (as for the infrastructures in the D4Science federation).
Dostları ilə paylaş: |