Digitization was a hot topic at the 2013 Association of Southeastern Biologists’ (ASB) meeting held in Charleston, West Virginia the week of April 10. Well before the beginning of the ASB–iDigBio-sponsored digitization symposium and workshop, several conference goers had already offered important papers outlining strategies and successes in digitizing small herbaria and incorporating digitization into biodiversity field research.
Tanja Schuster, curator at the University of Maryland’s Norton-Brown Herbarium (MARY) outlined efforts in launching a virtual herbarium through the herbarium’s new website (www.nbh.psla.umd.edu). Important highlights included how federal work study students recorded 21,000 high-resolution images of MARY’s 87,000 specimens. About 5,100 of these images are now available online, with plans for more to come. The new website offers an interactive key as well as important distributional data about Maryland’s invasive plant species. MARY uses Specify 6 as its database management system in concert with a custom PHP-based front-end for serving data and images to the web.
Kari Harris, graduate student at Arkansas State University, reported on her masters of art project for which she organized a cadre of four undergraduate students to annotate and image 17,500 of the STAR herbarium’s Arkansas specimens. The team was able to achieve a rate of about 175 specimens per hour and imaged the collection in little more than one semester. Student technicians are now using the images to database and georeference the collection, ensuring the availability of this small, but important collection for botanical and ecological research. Moreover, Kari developed successful methods for attracting and organizing volunteer student workers in support of collections digitization.
University of North Carolina graduate student Derick Poindexter, no stranger to the Southeast Regional Network of Expertise and Collections (SERNEC) community, outlined methods for integrating field research, the production of a county flora, Georeferencing technology, voucher deposition, and digital documentation, all in service of floristic surveys. Derick’s presentation underscored the importance of simultaneous integration of digital data into scientific research and curation protocols, a goal of many collections digitization initiatives.
The Friday morning (April 12) ASB–iDigBio-sponsored symposium attracted about 25 participants dominated by herbarium professionals but also including representatives from vertebrate, invertebrate, paleo, and algae collections. Following brief remarks by workshop coordinator Ashley Morris (Middle Tennessee State University), Gil Nelson offered an introduction to iDigBio and fielded questions about the National Science Foundation’s Advancing Digitization of Biodiversity Collections (ADBC) initiative before turning the program over to Kim Watson of the New York Botanical Garden (NYBG).
Kim’s presentation, Plants, herbivores, and parasitoids: Tri--trophic digitization strategies detailed the practical implementation and workflows of an ADBC-funded Thematic Collections Network (TCN), one of the first three of seven that are currently funded. The TCN involves the American Museum of Natural History, NYBG, and several collaborators. TCN members are integrating data on insect herbivores, their hosts, and related insect parasitoids in an effort to fill an important data gap in the study of tri-trophic relationships.
Hank Bart (Tulane University) followed Kim with Efficiencies and challenges of organizing an ADBC TCN project on southeast freshwater macrofauna, offering a detailed glimpse of the planning efforts required for developing a TCN concept and the resulting project description and proposal. Hank outlined plans for a huge digitization effort–including fishes, crawfishes, and aquatic mollusks–that spans numerous institutions in 11 Southeastern states. The proposed project capitalizes on an assemblage of previously georeferenced southeastern fish collection localities amassed through GeoLocate’s Collaborative Georeferencing Platform.
Following a brief but discussion-filled coffee break, Zack Murrell (Appalachian State University) recounted the major issues facing museum informatics and the importance of collective expertise in addressing solutions. Entitled So many herbaria, so little time: Challenges and opportunities in biodiversity informatics, Murrell’s presentation suggested that “we are now poised to address ‘dark data’ and the ‘long tail of science’ as we gather metadata and specimen data from smaller regional collections.”
Andy Bentley (University of Kansas Biodiversity Institute) ended the morning with Collaborative digitization workflows with Specify 6 (download Andy’s talk here), which included an accounting of Specify’s capabilities for data entry, bulk data uploading and validation, image storage and linking, duplicate data discovery, OCR archiving, and integration of specimen images and labels into digitization workflows. Specify is free, open source software, funded initially by NSF and widely used in the collections community. Bentley pointed out that Specify software aligns with the goals of ADBC, iDigBio, and the requirements inherent in developing a Thematic Collections Network.
Friday’s symposium set the stage for a Saturday workshop focused on workflows and challenges in the digitization of biological specimens. The workshop, with more than 30 participants, followed a presentation/discussion format, with plenty of time for rich interchange among participants. The workshop’s goals were to:
- · Introduce iDigBio (Integrated Digitized Biocollections),
- · Review existing workflows from workshop participants,
- · Detail important principles for digitization workflow design and development,
- · Outline examples of major workflow patterns,
- · Consider the cultural/social issues that underpin a digitization program,
- · Present strategies for serving data and images on the web,
- · Develop an understanding of the importance of providing identifiers for your specimens and data,
- · Offer methods for improving data in Excel spreadsheets, and
- · Provide plenty of time for discussion, contributions, sharing, and questions!
The workshop kicked off with Chris Dietrich, Principal Investigator on the InvertNet TCN, followed by a fascinating array of lightning round talks: 5-minute, 1-slide presentations through which participants highlight the salient aspects of their digitization program or workflow.
Chris’s topic was Invertnet: A new paradigm for digital access to invertebrate collections. InvertNet is another of the three TCNs funded in the first round of the ADBC program. The project is centered on 22 arthropod collections held in institutions distributed throughout the upper Midwest. InvertNet’s goal is to develop high throughput workflows and robotic technology for digitizing pinned insect drawers, including the development and adaptation of advanced imaging and 3D reconstruction.
Several participants offered lightning round presentations, including Hank Bart, Andy Bentley, Katherine Gregg (West Virginia Wesleyan College), Kari Harris, PJ Harmon (West Virginia Division of Natural Resources), George Johnson (Arkansas Tech), Ron Jones (Eastern Kentucky University), Ashley Morris, Mark Schlueter (Georgia Gwinnett College), Tanja Schuster, and Kim Watson. The diversity of lightning round presentations precipitated robust discussion centered on the efficacies of varying approaches to collections digitization.
Following the lightning rounds, Gil Nelson presented an overview of workflow concepts and common practices, with an emphasis on the major task clusters that define the digitization process and key guidelines for designing and documenting workflow protocols. Deb Paul (iDigBio) followed with a multi-faceted look at several issues facing those involved in a collections digitization program, including an accounting of the social issues affecting collaborative digitization, the importance of assigning identifiers to digitized objects, and strategies for serving data and images to the web. Deb ended with a live demonstration of OpenRefine, open source software that provides advanced tools for cleaning data stored in spreadsheets and other common data formats. OpenRefine is especially well-received by those who regularly use Excel for data storage or as a vehicle for importing data into Specify, Symbiota, and other enterprise-level databases. The power of OpenRefine has spurred the Specify software team to consider integrating some of its functionality into the Specify Workbench.
The day ended at about 5 p.m., following an excellent round of discussion and with renewed vigor for addressing digitization challenges as well as numerous requests for more iDigBio-sponsored workshops.
To learn more about the contents of the presentations visit the workshop wiki at https://www.idigbio.org/wiki/index.php/ASB_Digitization_Workshop. To see images from the workshop, click here.
iDigBio wishes to acknowledge and thank Ashley Morris and Hank Bart for the vision to organize this symposium and workshop combination, and especially to Ashley Morris for overseeing the hard work required to bring these two events together.