The aim is the establishment of a shared data infrastructure open to all interested, where the data provider has complete control over his/her data.
Data Providers
It is important to determine target data providers for ABBIF’s initial phase. Focus should be given to specimen and specie data, and the organization of data providers must be country driven, meaning that the articulation and involvement of different providers will be carried out nationally.
Biological collections, due to the nature of their activities, are information centers. They must have sufficient infrastructure and expertise to set up their own information system for internal purposes. Those that also have the necessary infrastructure and expertise to hold an internet information system available 24 hrs a day can serve their data directly to the network. Those that don’t have or don’t want to maintain dynamic links should have a mechanism to submit, alter, and delete their data at a regional server (or cache node).
Figure 2 shows a diagram of the network.
Figure 2. Component data provider: biological collections
Collections with dynamic links and regional servers must adopt compatible standards and protocols and must be held in institutions capable of maintaining the system and serving data through fast Internet connections.
Observation data and taxonomic descriptions represent two other groups of data providers, individuals or research groups. This is the case where facilities must be offered by data custodians where researchers may deposit their data for full and open access on the internet. This is not a task for amateurs. There must be a specialized staff that has as its main activity the development and maintenance of information systems that guarantee the preservation and dissemination of data.
Based on the concept of open and free access to non-sensitive data, this element of the network will be called digital data commons space6. The network may have more then one server that will guarantee the necessary infrastructure for preservation, maintenance, recuperation, and dissemination of the data. Internet connectivity must be stable and fast (figure 3).
Figure 3. Architecture element: digital data commons space
This element could involve stakeholders from the conservation community with important observation data that are normally disseminated through books and reports.
Portal
GBIF today has a data index that serves data to the system. A subset of over 96 million records, with name and locality data is harvested from 174 data providers and maintained at a centralized database. This makes the basic search system much quicker and solves problems such as slow or unstable connectivity. After carrying out the basic search the user obtains a list of providers with the number of records found. Users can then display the list of records corresponding to each provider. Users can also download the selected records and may choose to do so directly from the data providers or from the GBIF index (faster), and may also select the format of the downloaded file. There is also a map illustration of the distribution of the requested records that can be produced dynamically.
CRIA developed a fully distributed system. When a query is processed it is sent out to the providers that search the databases and dynamically send the results. At the moment, the speciesLink Network has 6 regional servers (mirroring data from 38 collections) 2 collections with dynamic links, one centralized database with observation data (at CRIA), and one centralized information system of microbial collections (with 9 collections). This architecture is interesting for advanced users that can search any field and retrieve the full data set as a file. Speed and the “fragility” of the network is a disadvantage. If a server for any reason is off-line, that “branch” of the network will be unavailable. Maps are also produced dynamically.
CRIA developed an indexing service of a subset of the data which is used for data cleaning. At the moment CRIA is planning to provide a service to users of searching its index for the data subset to provide faster results and a more stable system. But the distributed search system will continue to be offered as it is very powerful and important to advanced users.
Tools
Another important activity is the development of tools for data providers and users. These tools should be preferably developed as web services to be able to be used more freely at all levels (local, country, and regional).
Data archive
An important element to be addressed in the network design is long term data archiving. This may also be a task for country data custodians or their partners. It is important that the scientific council discusses this issue to determine priorities as to what data should be added to a permanent archive and identify an institution or a pool of institutions responsible for this activity.
Figure 4 below presents a diagram of the system.
Figure 4. Diagram of the system
Standards
In order to build a distributed or combined system, basic standards must be used. Of immediate interest to ABBIF are: DarwinCore; ABCD – Access to Biological Collection Data; DiGIR; BioCASe; and TAPIR. The report on the proposed architecture offers a brief description of each one. Basically, the data model and communication protocol DarwinCore/DiGIR has been adopted by information systems in the Americas and ABCD/BioCASE has been adopted by Europe. Tapir is being developed to meet the needs of both DiGIR and BioCASE protocols.
Information systems of Amazonian countries were analyzed to see whether an interoperable system could be developed.
Brazil
Brazil has two very important projects underway that are of direct interest to ABBIF: the speciesLink network7 and PPBio – MCT8, the biodiversity research program of the Ministry of Science and Technology. The speciesLink network involves 40 collections, one centralized information system of observation data from São Paulo State (SinBiota9) and one centralized network with 9 microbial collections (SICol10). The network adopted DarwinCore and DiGIR as data model and protocol. PPBio of the Ministry of Science and Technology is adopting the speciesLink architecture as a model. Data from the network could be immediately linked to ABBIF.
Bolivia
Bolivia does not have an on-line information facility in place but the Noel Kempff Mercado is willing to participate in this effort. Due to slow connectivity, at a first stage perhaps the best solution would be for Bolivian collections to deposit their non-sensitive data in a regional server with good connectivity. Capacity should be built in order to address the problems associated with connectivity and biodiversity informatics.
Colombia
Colombia has a GBIF node established at the Alexander von Humboldt Biological Research Institute, responsible for the Biodiversity Information System SIB (Sistema de Información sobre Biodiversidad11). The interface for distributed searches is not available yet.
SIB’s communication protocol was being developed concurrently with DiGIR and is now aiming at compatibility. The “Standard for exchanging biodiversity information to the organisms’ level Estándar para intercambiar información sobre biodiversidad al nivel de organismos was built based on the mandatory elements indicated in Darwin Core V2, and in some data elements proposed in the Estándar para la documentación de registros biológicos developed by SIB. It seems clear now that in order to share data with other initiatives it is important to use a common protocol and GBIF recommended that SIB should use TAPIR that should be ready for testing in the near future. Meanwhile, in order to integrate SIB in the network it may be necessary to implement some translation routines from one standard to another, which is not a big problem.
Ecuador
Ecuador does not have an information system, responsible for integrating data from biological collections, but the herbarium of the Pontificia Universidad Católica del Ecuador (QCA) is in the process of digitizing its data and is willing to share non-sensitive data through the Internet. This is the case where a regional server could be installed to start the process of sharing biological collection data in Ecuador. The herbarium would either have to adopt data model standards such as DarwinCore or some translation routines would have to implemented.
French Guyana
Although French Guiana is an overseas department of France and, consequently, is politically a part of Europe, it is located in South America, within the Amazon region, and for this reason was included in the analysis of Amazonian countries. The “Herbier de Guyane (CAY)”, a Center of the Institute de Recherche pour le Developpment (IRD) in Cayenne adopted the data model RIHA (Réseau Informatique des Herbiers Africains) that is compatible to ABCD. Biocase is used as a communication protocol and CAY already serves data through GBIF.
Peru
Peru has developed Siamazonia12, the information system for biological and environmental diversity of the Peruvian Amazon (Sistema de Información de la Diversidad Biológica y Ambiental de la Amazonía Peruana). Siamazonia was created in 2001 through the BIODAMAZ project (Proyecto Diversidad Biológica de la Amazonía Peruana), an agreement between Peru and Finland, and was developed by the Instituto de Investigaciones de la Amazonía Peruana (IIAP). IIAP is a GBIF node and therefore is a natural partner of the ABBIF network. Its structure is based on nodes, similar to GBIF. Siamazonia is already serving data to GBIF using DarwinCore/DiGIR.
Venezuela
Venezuela does not have an information system in place but is beginning to develop an integrated information system on Collections of Vertebrates (Sistema Integrado de Información de Colecciones de Vertebrados de Venezuela). There are 3 institutions participating in this initiative: Museo de Historia Natural La Salle (MHNLS), Museo de Biología de la Universidad Central de Venezuela (MBUCV), and Museo Estación Biológica Rancho Grande (EBRG). Although the intention is to set up a web interface for on-line access, the idea is to control access, having different levels of accessibility according to the user. This restriction, if maintained, will mean that the system will not be interoperable.
Dostları ilə paylaş: |