Entomologists Gather for Insect Digitization Workshop in Chicago

 

Chicago’s Field Museum of Natural History (FMNH) turned out to be the perfect venue for iDigBio’s April 23–25 (2013) Dried Insect Digitization Workshop. Overlooking Grant Park and the Chicago lakefront, FMNH provided an exceptionally attractive and hospitable environment with outstanding amenities. About 50 entomologists and digitization professionals from the U.S., Australia, and the United Kingdom attended, bringing together a diverse assemblage of knowledge and skill to address the complex job of digitizing pinned insect collections. U.S. attendees represented institutions across the continent, from Alaska and Southern California to Florida and New England. iDigBio owes a tremendous debt to the Field Museum—especially Margaret Thayer and Petra Sierwald—for co-sponsoring  and ensuring a successful workshop.

Those who wish to review the workshop’s agenda, presentations, and collaborative notes documents are invited to visit the workshop wiki accessible from iDigBio’s Digitization Resources Wiki (https://www.idigbio.org/wiki/index.php/Digitization_Resources), or directly at https://www.idigbio.org/wiki/index.php/Dried_Insect_Digitization_Workshop. Presentations and related documents are posted and available for download.

Activities started Tuesday evening with a reception and dinner that afforded opportunities to create friendships and renew acquaintances, and to begin the discussion of insect specimen digitization. Many thanks to the logistical prowess of iDigBio’s Cathy Bester, who arranged the event, and the courteous attention from the staff at Mei’s Kitchen for ensuring a successful start to the workshop.

Wednesday began with a delightful walk through Grant Park from the hotel to the museum’s botany classroom, where Gil Nelson kicked off the agenda with an introduction to iDigBio and the National Science Foundation’s Advancing Digitization of Biodiversity Collections (ADBC) initiative. He then turned the program over to Margaret Thayer for a presentation about issues collections managers should consider before launching a digitization program. Margaret punctuated her talk with experiences at the Field Museum and stressed the importance of establishing well-thought-out goals and purposes before program implementation. Andy Deans (Pennsylvania State University), who launched a whole-drawer digitization program while at North Carolina State, underscored Margaret’s admonitions as he outlined what he’d learned from that experience and how he will apply those lessons at the Frost Entomology Museum as he launches yet another digitization program. Library scientist Larry Schmidt (University of Wyoming) encouraged participants to avail themselves of institutional resources beyond their own department or college, especially by enlisting assistance from library professionals within their institutions. Larry pointed out that library scientists usually have long histories with digitization and are often intently interested in making university-generated information available to those outside of the university community—including biodiversity collections data. Larry outlined a collaborative program between the library and an herbarium that he reported in a 2007 paper.

Imaging is a major component of any specimen digitization program and is especially challenging for entomology. In the pre-workshop needs survey, participants expressed major interest in learning about strategies, guidelines, and tools for establishing effective imaging workflows and protocols. Roy Larimer (Visionary Digital), who is well-known for his work with numerous entomology collections, offered an important array of very helpful considerations ranging from workflow protocols and imaging standards to effective software solutions. Roy also provided a very helpful set of links to web-based resources, which is available on the workshop wiki or directly here.

Whole-drawer imaging is gaining traction in the U.S., as evidenced by a number of recent initiatives. Perhaps the most ambitious of these is the technology being developed by the InvertNet Thematic Collections Network (TCN), one of the three TCNs funded in the first round of ADBC awards. Whole-drawer imaging promises to be an effective rapid-throughput strategy for digitizing thousands of insect drawers and unit trays within a comparatively short time period. Chris Dietrich (Illinois Natural History Survey), Principal Investigator at InvertNet, provided participants with an update on the project, including a review of the whole-drawer imaging robot that project engineers are developing. The robotic camera system is now operational, as demonstrated by David Raila (at right), with continuing development and refinement of the software designed to stitch the resulting images into a single composite image. Future iterations of the software will segment the stitched images and isolate individual specimens for enhanced data capture. The robot is designed to tilt the camera to allow imaging of both specimens and labels, in effect producing a greater than 2-dimensional if not quite fully 3-dimensional representation. When perfected, InvertNet’s technology has the potential to foster important advances in arthropod digitization.

Brian Wiegmann (North Carolina Staten University and Associate Director for Education and Outreach at NESCent) expanded on the use of Gigapan technology at NC State, a system launched in collaboration with Andy Deans. Gigapan is essentially a robotic panoramic camera complete with stitching software that creates high resolution panoramic images available for examination through an online viewer. Brian and Andy used the technology to capture insect drawers and serve them on the web with the primary goal of increasing the visibility of the NC State insect collection. Metrics indicate the project has been highly successful. The drawers have been viewed about 350,000 times, or approximately 100 times per drawer.

Vlad Blagoderov (British Museum of Natural History) reported on insect digitization at the Natural History Museum (NHM), which contains about 30 million specimens and 120,000 drawers representing 700,762 species. Over 300,000 specimen records have been digitized. NHM’s challenge is to digitize about 20 million specimens within the next 5 years. Part of the solution is to utilize the SatScan whole-drawer imaging system, a rapid throughput technology supplemented by an optimized workflow and coupled with software designed to promote efficient specimen-record data entry from images.

Nicola Ferrier, engineer with Argonne National Laboratory, a collaborative venture between the University of Chicago and the U.S. Department of Energy, has teamed up with the Field Museum to explore the use of optical character recognition (OCR) for extracting data from pinned insect labels without disturbing the specimens. Under experimentation are methods for aligning and registering several images from multiple perspectives to create a composite label image suitable for OCR. Nicola presented an idealized label automation workflow that includes iterative OCR extraction, training, and authority file construction as a method for continual improvement of OCR output.

During lunch, participants were encouraged to take a personal tour of the Field Museum or participate in guided collections tours led by FMNH staff. After lunch, participants were treated to several demonstrations, including stations featuring the InvertNet robot, Leica and Olympus microscopic camera stations, the Hirox microscopic imaging system, the Passport system developed for the Southwestern Collections of Arthropods Network (SCAN) TCN, a Microptics imaging station, and a new system being offered by Ortery. Beka Baquiran, Jim Boone, and Jim Louderman, all of FMNH, served as tour guides and each station was staffed by an expert who provided answers and comment for a plethora of questions and observations.

The late afternoon session featured a set of workflow presentations. Gil Nelson outlined general guidelines for designing workflows and protocols, followed by Larry Gall (Yale Peabody Museum), Brian Fisher (California Academy of Sciences), and Paul Flemons (Australian Museum, Sydney). Larry offered workflows perspectives from Peabody entomology, stressing the history of the museum’s collections and digitization initiatives and modern-day implementations. He demonstrated how EMu software forms the backbone of much of the YPM’s workflow, outlined the process of pre-digitization curation, highlighted novel methods for sorting and arranging drawers in preparation for databasing, underscored the important role student technicians play in the digitization process, and showed how collections digitization at the Peabody interfaces with Yale’s digital asset infrastructure. Brian focused on the development of AntWeb, the world’s largest online database of images, specimen records, and natural history information on ants. As of March 2013, AntWeb had over 107,000 ant images representing 10,549 species and more than 354,000 specimen records. The website contains an online ant catalog, an important contribution to ant systematics as well as to those seeking authority files for establishing insect databases. Brian reviewed the camera systems and databasing strategies used by AntWeb as well as its online curation tools.

Paul Flemons’ expertise in biodiversity digitization is extensive, much of it based on the use of citizen scientists and public participation within digitization activities. Paul outlined a workflow built on the transcription of label images by online volunteers. He noted the importance of images as digital vouchers, their role in reducing specimen handling, and their use in decoupling the stages of digitization to benefit from division of labor. A paper co-authored by Paul and Penny Berents is available via the wiki or directly here.

The final day began with Larry Gall’s strategies for selecting, managing, and motivating digitization personnel, followed by an excellent period of discussion, reflection, and observation on the previous day’s activities and lessons. Larry encouraged participants to reflect on the task at hand, create re-usable solutions, copy what works, develop discrete iterative tasks, remain flexible, and listen closely to technician feedback, and then offered a number of vignettes to illustrate these points.

The remainder of the morning focused on databasing and database management systems, led off by iDigBio’s Joanna McCaffrey, who outlined major considerations for choosing and adopting collections management software. Joanna was followed by several software-specific solutions, including the salient features of Specify, presented by Andy Bentley (University of Kansas), Symbiota, presented by Ed Gilbert (Arizona State University), EMu with Beka Baquiran (FMNH), Arctos with Derek Sikes (University of Alaska), and Arthropod Easy Capture, the American Museum of Natural History’s management system, presented by Signe Valentinsson (AMNH). These presentations were followed by spirited interchange as participants discussed various database characteristics important to insect specimen databasing.

Following lunch and another round of onsite guided and self-guided tours, Jennifer Thomas (University of Kansas) outlined KU’s approach to pre-digitization curation. KU’s entomology collection has now databased about 950,000 specimens, virtually all of which are stored in Specify. Jennifer outlined the process used for setting priorities and tackling taxonomic authority file generation and validation utilizing numerous community resources. She also explained KU’s plan for physical re-curation of drawers and unit trays prior to data entry, including affixing pin-label barcodes on each databased specimen, recording images of labels, and importing the resulting images into Specify. Jennifer’s talk resulted in numerous questions and observations about the benefits of label imaging, and a stimulating discussion of specimen-level identifiers.

The remainder of the afternoon included several talks relevant to data cleaning, specimen identifiers, and strategies for serving data and images to the web, presented by iDigBio’s Deb Paul and Joanna McCaffrey, and a brief consideration of Georeferencing tools and practices by Gil Nelson. The final day wrapped up with discussion and observation highlights, followed by a group photo session in front of the dinosaur exhibit in Stanley Field Hall on the museum’s main floor.