Bringing ‘dark data’ into the light: Best practices for digitizing herbarium collections
New workflow modules will facilitate imaging and data transcription for thousands of plant specimens
Imagine the scientific discoveries that would result from a searchable online database containing millions of plant, algae, and fungi specimen records. Thanks to a new set of workflow modules to digitize specimen collections cur- rently preserved in herbaria, something like that might be within reach. The modules are provided by the National Science Foundation’s (NSF) Integrated Digitized Biocollections (iDigBio), which is facilitating a collective effort to unify digitization projects across the nation.
“North America’s herbaria curate approximately 74 million specimens and only a fraction have made it online,” says iDigBio’s digitization specialist Dr. Gil Nelson. “Having these data available at one’s fingertips will enable advanced queries and new discoveries while ensuring inclusion of the so-called ‘dark data’ that reside in a significant percent- age of the United States’ more than 600 active herbaria.”
According to recent estimates, approximately half of U.S. herbaria and universities have yet to begin mobilizing data. Nelson coordinated the development of the workflows, working alongside 28 other contributing authors, to provide guidance to institutions just beginning digitization programs as well as those seeking to streamline and tweak their current digitization configuration.
The 14 modules, each organized in 7–36 easy-to-follow and customizable tasks, cover everything from setting up an imaging station to georeferencing. They also include methods to organize outreach events for public participation in imaging and data transcription. They are downloadable as Portable Document Format (PDF) and editable word pro- cessing files on GitHub (https://github.com/iDigBioWorkflows/FlatSheetsDigitizationWorkflows) and as PDF files at iDigBio (https://www.idigbio.org/content/workflow-modules-and-task-lists). A full description of the workflows and their development, along with editable word processing files of the workflow modules, is available in the September issue of Applications in Plant Sciences (http://www.bioone.org/doi/pdf/10.3732/apps.1500065).
iDigBio first launched working groups in 2012 to address a deficit in online biodiversity data. Six initial modules sparked an increase in digitization, but evolving digitization and curatorial practices made possible more comprehen- sive task lists. The latest set of modules is the result of continued collaborations, virtual meetings, visits to many her- baria, iDigBio workshops involving over 50 researchers, and contributions from 15 NSF-funded digitization projects.
“The greatest challenge in producing generic, broadly applicable workflows was determining and presenting a con- sensus statement of agreed-upon components while preserving maximum flexibility for institutional implementation over a broad array of herbaria,” says Nelson.
For Nelson, digitization is the starting point of new avenues to guide biological and ecological research. He envisions huge multi-organismal data sets that will enable researchers to study yet-to-be recognized ecological, biological, and cultural relationships. The work at iDigBio is laying the foundation for a very powerful online resource.
iDigBio provides digitization education and resources to institutions across the United States and is funded by the NSF’s Advancing Digitization of Biodiversity Collections program (ADBC).
For further information:
This publication originated from modules created by iDigBio's DROID working group.