Research Data Repositories

From iDigBio
Revision as of 09:49, 20 May 2016 by Grungle (talk | contribs)
Jump to navigation Jump to search

A data repository in its broadest sense is a destination for data storage. There are many online data repositories, and several organizations, such as re3data.org (Registry of Research Data Repositories) and biosharing.org Information Resources provide curated listings of data repositories.

Although iDigBio is a repository for recordsets of primary biodiversity data of vouchered natural history collections, it is not a "data repository" as defined by most journals. Accepting individual researcher datasets, even those consisting of vouchered, natural history specimen digitized data and media, currently falls outside of the Scope of iDigBio.

Journals such as Scientific Data, Biodiversity Data Journal, and others, are starting to mandate that researchers submit their data to a publicly accessible data repository prior to article publication for peer review, discoverability, repeatability, and reuse. Some journals will host data, others require that the datasets be submitted to community-recognized repositories or to general-science repositories. Further, researchers are being increasingly required by funding agencies to better account for, manage, archive, and make publicly accessible the digital data resulting from federally funded scientific research.

iDigBio ingests standardized data in Darwin Core and Audubon Media Extension (Audubon Core) formats, along with other data standards extensions, including Ecological Metadata Language Standard (EML), Global Genome Biodiversity Network extensions(GGBN) and the Material Sample Core data standards among others (see the GBIF Registered Extensions) associated with vouchered natural history specimens from institutions and collections from the United States and around the globe.

iDigBio accepts primary biodiversity data and associated media from institutions and collections housing vouchered biological specimens. Lists of collections can be found on the iDigBio U.S. Collections page, in the Index Herbariorum, or GRBio (Global Repository of Biodiversity Collections).

iDigBio recommends, as a first step, that researchers repatriate data and media to the institution at which each of the specimens are housed. This will both enhance the collection's data set, and will also result in the data being shared, ultimately, through iDigBio. If the institution is not able to accept or incorporate your data within their collection management systems or data systems, consider publishing your dataset publicly with appropriate metadata in one of the many online data repositories. These can provide a DOI (digital object identifier) or other identifier for the dataset, which can be shared with collections and other downstream users.

There are many online organizations that will publish your valuable scientific data, including voucher specimen primary biodiversity data and associated media. Data should be curated and static, but new versions can be published. The following are by no means the only options available.

Name URL Information
Dryad Digital Repository http://datadryad.org/ A curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable. Dryad provides a general-purpose home for a wide diversity of data types. Data publishing charges may apply, but many journals, societies and institutions provide sponsorship or are paid members.
figShare http://figshare.com/ Free accounts to individuals for the publishing of publicly available research data. It allows users to upload any file format to be previewed in the browser so that any research output, from posters and presentations to datasets and code, can be disseminated.
DataONE Dash https://dash.cdlib.org/ A self-service tool for researchers to describe, upload, and share their research data. DataONE comprises a distributed network of data centers, science networks or organizations which can expose their data within the DataONE network.
GBIF http://www.gbif.org/publishing-data/summary The Global Biodiversity Information Facility (GBIF) accepts metadata, checklist, occurrence-only (including observational), and sampling-event data from organizations. Individuals will need to work through their organization to publish research datasets.
PlutoF https://plutof.ut.ee/ Enables users to create, manage, share, analyse and publish biology-related databases and projects.
CyVerse Data Commons Repository http://www.cyverse.org/data-store Assigns DOIs and ARK identifiers. If your dataset is complete, stable, ready for public consumption and can be reused in other analyses, CyVerse may be the data repository for you.
Morphobank http://morphobank.org/ A collaborative research platform linking morphological data with phylogenetic data. MorphoBank can be thought of as two databases in one: one that permits researchers to upload images and affiliated data with those images (labels, species names, etc.) and a second database that allows researchers to upload morphological data and affiliate it with phylogenetic matrices. When a paper associated with the project is published, the researcher or research team can make their data permanently available for view on MorphoBank where it is archived.
Morphbank :: Biological Imaging http://www.morphbank.net/ A biological image database that accepts images and data associated with vouchered natural history collections. Morphbank is a publisher with iDigBio, sharing data through an IPT feed.
GenBank http://www.ncbi.nlm.nih.gov/genbank/ Accepts nucleic acid sequence data associated with voucher specimens. For ease of discovery and downstream linking, tracking and citation, ensure that the Source: specimen-voucher fields are appropriately completed, including the unique identifier belonging to the specimen and other details recommended for barcode submissions.
BOLDSystems http://boldsystems.org/ Barcode of Life Data Systems is a data repository for DNA barcoding data.

Many institutions have library systems that will also publish data, so ask your local librarian!

iDigBio highly recommends including the occurrenceID (Occurence ID) and/or iDigBio recordID data with each of the data records. If the data have been derived from voucher specimens, also include the institution code, collection code and collection number (and/or the collector and collection number so their efforts can also be acknowledged). This allows for downstream data tracking and appropriate data citation of both your efforts and of the original specimen and the collection which houses it. It allows institutions and collections to discover any data quality improvements you may have made, and incorporate them for future publication and collections management. Some journals, such as Pensoft, are making it easy to import occurrence records into manuscripts from data repositories, such as iDigBio, GBIF, PlutoF and BOLD. Regardless of its publication status, iDigBio recommends that researchers publish biodiversity data to a public data repository for data discovery, preservation, and for reproducible science.

As always, reach out to us at iDigBio if you need more ideas or guidance on sharing biodiversity research data.