Research Spotlight: November 2021

 

Mass digitization of herbarium specimens using a novel photostation design
Jonathan Kennedy and Charles Davis; Harvard University Herbaria, Cambridge, MA, USA

Citation: Davis, C. C., Kennedy, J. A. & Grassa, C. J. Back to the future: A refined single-user photostation for massively scaling herbarium digitization. Taxon 70, 635-643, doi:https://doi.org/10.1002/tax.12459 (2021).


1. Current HUH photostation being operated by herbarium staff member.

The digitization of natural history collections has continued to accelerate in the United States and globally. Over 200 million specimen records are now available online through various institutional and community-specific portals. Since the inception of these efforts, and with significant investment from governmental funding agencies, such as the National Science Foundation, the biodiversity and collections communities have worked to advance the technologies, workflows, and best practices underpinning digitization. In herbaria, pressed and dried plant specimens account for a majority of the collections, and efforts to digitize these specimens have significantly relied on two-dimensional photographic imaging technologies. Over the years, the community has experimented with and organized imaging workflows around the use of scanners, photographic copy stands, lightboxes, and, most recently, industrial-scale conveyor belt systems (the latter two often using “prosumer”- or commercial-grade digital cameras). Leveraging these advances, combined with emerging vendor solutions, a handful of institutions have successfully undertaken comprehensive, herbarium-wide digitization projects. The most well-known examples of these include the Smithsonian, the Natural History Museum in Paris, and the Naturalis Biodiversity Center in Leiden. However, despite these successes, digitizing entire herbarium collections has remained economically out of reach for all but these few well-funded, nationally-owned collections. 


2. Conveyor belt system created for the New England Vascular Plants Project, circa 2012-2016.

In 2016, our team of staff and researchers at the Harvard University Herbaria (HUH) began investigating methods to affordably digitize large numbers of herbarium specimens. This was motivated by a strategic vision to digitize all of the HUH’s over 5 million plant and fungal specimens, of which, ca. 4 million are pressed and dried vascular plants. The HUH had previously collaborated on multiple thematic collection networks (TCN), including the New England Vascular Plants project which pioneered the use of a conveyor belt system for herbarium specimen digitization. Upon completion of this project we carefully evaluated the conveyor-belt system for future use and explored similar vendor options, but our analysis uncovered various obstacles and hidden costs to using these systems. These obstacles largely centered on the size and space requirements of a conveyor belt: a suitable space was not readily available at the HUH and moving specimens out of the building for digitization, in large numbers especially, created concerns about the time specimens would be removed from the collection, risk of damage, cost of additional pre-curation, and managing re-entry into pest-management zones. We also observed that conveyor belt systems require multiple staff to operate, thus even small interruptions to staff availability can considerably impact the real-world efficiency of these systems. Further, we noted that the efficiency of conveyor belts are naturally limited by the speed of staff placing and removing specimens to and from the system. This observation, in particular, motivated our team to compare the efficiency of conveyor belt systems against the same number of staff using more traditional single-user copy stands.


3. Traditional workstation and copy stand used at many herbaria.

To compare the base-line efficiency of a single-user imaging station, we assembled a workstation using a table, copy stand, and camera; equipment that had been used for past digitization projects at the HUH. For this experiment, we eliminated certain workflow steps that had been performed by staff for previous projects, but which we believed could be replaced with automation. This meant eliminating all data entry, including barcode capture, and image adjustments. Through this experiment we were able to demonstrate that, on a per-staff basis, use of a single-user imaging station could achieve similar or faster imaging throughput as compared to our previous conveyor belt system, and at a cost competitive to current vendor solutions. These results inspired our team to transition our focus to investigating methods for improving the speed and efficiency of traditional single-user imaging equipment. Through continued research and development, we added capabilities to our informatics pipeline to reduce manual effort and evolved the traditional copy stand approach by designing an imaging station (photostation) tailored to herbarium workflows. 


4. Current HUH photostation with dimensions. The small footprint makes it easy to digitize near the collections. 

The result is a photostation design (Image 4) that significantly increases the efficiency of imaging herbarium specimens, especially in large numbers. Our design benefits from many ergonomic and workflow-specific customizations, including physical dimensions tailored to herbarium sheets, an open-stage design for rapid placement and removal of specimens, built-in shelf for holding specimens, guides on the photo stage for quick specimen placement, attached automatic barcode dispenser, USB button for quick photo capture, adjustable height legs, and eye-level monitor. The photostation also requires minimal space and is semi-mobile, making it much easier to operate near the collection. Individually, each of these customizations saves a small amount of imaging time over a traditional copy stand, but, cumulatively, these refinements have yielded substantial efficiency gains; reducing imaging time, in some cases, to only a few seconds per image. Through our analysis of extensive real-world data, we have observed average imaging efficiencies of less than 10 seconds per specimen image.

In addition to improving the physical equipment, we eliminated data entry and manual image processing from our workflow by either automating these steps or batching them post-imaging. Barcode detection occurs in software using the specimen image and skeletal records are automatically added to our collections database. Specimens are imaged in “batches” which can be viewed and rapidly databased online using our web-based “Transcription App”. Staff can create specimen records with varying levels of data capture (i.e., skeletal, minimal, full label capture), and are prompted to transcribe specific fields based on project requirements or other policies. Additionally, batches of existing records can be organized to receive additional, in-depth transcription, supporting multi-phased data capture workflows.


5. Screenshot of the HUH Transcription App. Specimen data is rapidly transcribed from the specimen image. A magnified view helps with reading and can be moved by clicking on the overview image.

The combined benefits of 1) refining our digitization equipment, 2) isolating the imaging workflow from data entry, and 3) investing in informatics tools to reduce manual effort have allowed us to dramatically increase the efficiency of digitization at the HUH. In 2017, the HUH launched an initiative to digitize all of its North American vascular plant specimens, while concurrently contributing to multiple past and ongoing TCNs. Since 2017, we have deployed six of these photostations in the herbarium and HUH staff have imaged and databased nearly 600,000 specimens, bringing our online specimen record total to nearly 1.5 million. During this time, citations of HUH specimen records also increased by over 80% (averaged) each year.  Like many herbaria, the HUH has historically undertaken digitization as external funding was available. However, the efficiency and flexibility of our improved workflow has enabled the HUH to continue digitizing the collection as an ongoing strategic priority. 

This workflow also proved beneficial as the COVID-19 pandemic unfolded. While many universities sent staff home and wrestled with decisions about those that could not work remotely, HUH staff were able to continue databasing and georeferencing specimens without the need for physical access to the collection. Later, when campus reopened under a low-density occupancy plan, HUH staff were able to return to the office one day per week and generate sufficient specimen images to continue digitizing from home. 

Motivated by the continued success of our workflow, the HUH, along with 19 other collaborating institutions across the U.S., were recently awarded funding from NSF’s ADBC program to form the All Asia TCN. Asia represents one-third of the Earth’s landmass and flora, and contains an incredible range of climatic, geographic, and species diversity, and importantly, numerous critically endangered biodiversity hotspots. This project will digitize ca. 3 million vascular plant specimens from all parts of Asia, create a community portal aggregating ca. 15 million Asian vascular plant specimen records, continue the development of informatics tools for specimen digitization, and support our consortium partners in adopting elements of our high-throughput workflow. We anticipate the scope of this project will demonstrate that digitizing collections en mass is both economically advantageous and yields greater long-term use of collections.

While digitization in U.S. herbaria has progressed at an admirable pace, most of the world’s estimated 390 million plant specimens lack any online record that can be used for discovery or data analysis. Hidden in these collections are valuable insights to pressing global issues of nature conservation, biodiversity loss, and climate change. We believe that the scope of these problems (for collections and for biodiversity science) require solutions of similar scale. We anticipate this work will be useful to the community and that others can benefit from our photostation design, and we hope that our experience encourages greater investment in digitization with an eye toward mobilizing our collections at scale. 

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
2 + 4 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.