Abstract | The world’s natural history museums, herbaria, and other specialized collections documenting our planet’s biodiversity, collectively known as ‘biocollections,” are estimated to contain between 2 million and 3 million specimens. These biocollections are truly global in scope, in terms of the taxa, regions, and periods of time represented by their holdings. Hundreds of years of expense and effort contributed to their collection, preservation, and study. These specimens hold clues that help us understand evolution, species distributions, the introduction of pests and diseases, the movement of invasive species, and may help us to predict the effects of extinction events and climate change. By creating and sharing a digital record of a specimen object, we can increase its discoverability and use in research, yet despite technological advances and supporting grant initiatives, digitization workflow bottlenecks continue to impede the flow of data needed for Big Science, and possibly for the future of humanity. Advancements in automation of optical character recognition, natural language processing, and interfaces to support these are necessary for a transformative breakthrough in high-throughput digitization of biocollections. |