An identifier is an unambiguous label which specifies an entity.
Unique identifiers are widely used to designate physical objects, assisting in trading (e.g., the Universal Product Code bar code system), and the extension of similar principles to digital and abstract entities is a prerequisite for digital commerce of rights and intellectual content.
Although the design of unique identification schemes is a technical problem, it is also a business issue with implications for what is identified and how identified items are made available.
“In a dynamic and distributed information environment, the effective management of both metadata records and the resources they describe requires a systematic way of generating and assigning unique identifiers.”
“In a dynamic and distributed information environment, the effective management of both metadata records and the resources they describe requires a systematic way of generating and assigning unique identifiers.”
(N. Friesen 2002: Recommendations for Globally Unique, Location-Independent, Persistent Identifiers)
Life Sciences - Bioinformatics
“The World-Wide Web provides a globally distributed communication framework that is essential for almost all scientific collaboration, including bioinformatics. However, several limits and inadequacies have become apparent, one of which is the inability to programmatically identify locally named objects that may be widely distributed over the network. This shortcoming limits our ability to integrate multiple knowledgebases, each of which gives partial information of a shared domain, as is commonly seen in Bioinformatics”
(Clark, T., Martin S., Liefeld T., 2004: Globally distributed object identification for biological knowledgebases. Briefings in Bioinformatics. Vol.5 (1), 59-70.)
Automatically register samples when sample metadata are entered into collaborating data systems (e.g. IODP, MGDS)
Eliminates redundant metadata submission
Systems communicate via web services
Starting with REST based services. Could support SOAP in future.
Authentication
Investigating different technologies including GEON/GAMA
Metadata exchange and validation
XML schema
SESAR Service “MyGeoSamples”
Current Services:
Long-term preservation of information about samples
Lists of personal sample collections
Store images, field notes, etc.
SESAR Service “MyGeoSamples”
Services “Under Construction”
Search & sort personal sample collections
Create maps of sample locations
Establish links to data (publications, data systems)
Download tabular sample information to spreadsheets
SESAR Service “MyGeoSamples”
Potential Services:
Modules to manage administrative metadata (customizable)
Modules for creating & operating web interfaces to collections
Advantages
No IT infrastructure required (except a computer and an internet connection)
No maintenance and risk & contingency management
Access from anywhere by authorized individuals.
Platform independent
The SESAR Global Sample Catalog
SESAR integrates the World’s sample collections
Allows users to find/discover existing samples
Provides access to “sample profiles”
View sample information in SESAR as provided
Link to the specimen’s ‘home’ (archive)
Link to data (publications, databases)
The Challenges
IGSN Implementation Strategies
Work with investigators, curators and repositories to define & integrate registration process and IGSN into existing sample and data management workflows
Joint Workshop of SESAR & NGDC, February 26 & 27, 2007, Boulder, CO
Registration of repository and museum collections ongoing
Advance adoption of IGSN
Work with editors to make IGSN a requirement for data publication (e.g. Editors’ Round Table, Societies)
Work with funding agencies, large science programs (e.g. IODP, MARGINS, ANDRILL), CI projects (e.g. GEON, CHRONOS), and repositories on sample and data archiving policies
Work with CI Partners on system design & interoperability
Interoperability Workshop, January 2005 at SDSC
Working with GEON on authentication scheme
Working with IODP and KU/EarthChem on web services
Editor’s Breakout*
- Reporting Data:
Published paper is point of record. All data should be reported. No “representative data”, no “data can be obtained from author”, no data available at personal websites
Submission to databases should be strongly encourage
Unique sample identifier (IGSN)
This may solve the problem of poor sample metadata
This system is being implemented.
Essential component of successful database - contains sample metadata, allows samples to be followed through its analytical history.
Tracks samples and subsamples.
We should start using it now.
Support by Funding Agencies
“We have also funded an effort (SESAR) to uniquely identify all samples so that various analyses on the same samples can be cross referenced and listed. I would also like you to indicate in your dissemination plan that your suite of samples will be registered with SESAR.”
Letter of NSF Program Manager (OCE/MG&G) to a PI, processing paperwork for a grant (January 2007)
identifying, organizing, documenting, and cataloging existing data collections, preferably in a digital format;
constructing logical linkages and search engines that facilitate access to organizations and their geoscience sample and data collections;
dedicating adequate space — physical and digital — for storing and efficient accessing of existing and future samples and data sets;”
Joint Workshop of SESAR & NGDC IMLGS Boulder, CO, February 26 & 27, 2007
Define procedures & best-practices for
Creating & assigning IGSNs
Submitting metadata for GeoObjects to SESAR
Work towards an integrated system of sample catalogs
Recommend ways to define & implement standards for metadata and vocabularies
Identify possibilities for streamlining procedures for submission of sample metadata to catalogs
Workshop Recommendations
Streamlined Registration Process
Registration process should be simple
Options to integrate easily into existing sample and data management workflows
Ability to adopt required metadata from existing forms in use to avoid redundant metadata submission to multiple systems
Support automated registration from other systems via web services to avoid manual/redundant metadata submission
Objects should receive an IGSN at the time of labeling
Objects should have an IGSN before being distributed among multiple investigators and users
Parent objects should be registered before child objects
Metadata should include geospatial info (coordinates prefd.)
Workshop Recommendations
Batch Registration Forms
It is preferred that forms for the MGDS, IMLGS, and SESAR have the same column headers, which the metadata listed under this header clearly defined. The order of the headers can vary.
An XML schema for sample metadata should be developed to which the metadata in any spreadsheet can be exported.
SESAR Batch Registration Forms should be customizable, e.g. buttons beneath the header should allow to hide unnecessary columns. Columns for metadata that are identified as ‘recommended’ should always be visible.
SESAR should develop a manual for filling out the forms. The manual should include instructions regarding definition of parent – child relations. It needs to be decided if a site should get an IGSN. It is possible to link multiple stations taken at one site by including the site name as metadata.
Vocabularies and Classification Schemes
Adopt from existing standards as much as possible and work with repositories and other systems to use common schemes
It is preferable for different systems (MGDS, IMLGS, SESAR) to allow multiple vocabularies
List allowed vocabularies on the Marine Metadata Initiative (MMI) web site.
Registration Procedures to Support Integration with Existing Workflows: Under Implementation
Trusted Agents
A registrant can apply to become a Trusted Agent. Trusted Agents are authorized to generate unique IGSNs within their registered name space (user code). They can use tools, e.g. Excel, on the ship or in the field, to generate IGSNs within their given name space, have the samples labeled with IGSN, and submit the IGSN along with metadata via web services within a short time frame. Trusted Agents must sign a MOU outlining policy and procedures related to handling IGSN with trusted agents.
Example IODP: Name Space “DR0”, “DR1”,…
Registration Procedures to Support Integration with Existing Workflows: Under Implementation
Pre-Assigned IGSNs
Upon request, SESAR provides forms (spreadsheets) with pre-assigned IGSNs to chief scientists/investigators/repositories to take on ship/field. Forms filled with metadata should be submitted to SESAR post-collection. E.g.: SCRIPPS.
Other systems or repositories pre-populate their existing forms with IGSNs, obtained from SESAR, and provide to chief scientists. E.g.: MGDS provide forms with IGSNs to PIs in advance of R2K and MARGINS cruises. Post-cruise, MGDS will submit the sample metadata to SESAR.
Collaboration with Repositories & Systems: Ongoing