Digitization of the ETH Entomological Collection

Thu, 2017-10-05 09:35 -- maphillips

Contributed by: Rod Eastwood Curator, Entomological Collection, Eidgenössische Technische Hochschule (ETH) Zürich, Institut für Agrarwissenschaften, Biocommunication & Entomology, Zürich, Switzerland

The ETH Zurich Entomological Collection was founded in 1858 and currently ranks about fourth largest in Switzerland.  With around 2 million specimens it is by no means a large collection; however, it includes some important historical material from Heinrich Escher (1776-1853), Oswald Heer (1809-1883), and Anton von Schulthess (1855-1941), to name a few.  The focus is on Swiss and central European taxa with important collections of Hymenoptera, Coleoptera, and Lepidoptera.  There are about 5,000 type specimens.  The ETH Entomological Collection website is here.

In 2015 ETH Zurich initiated a digitization program for all their collections to coordinate efforts and feed data into an open access central corporate database.  The Entomological Collection under the direction of Prof. Consuelo De Moraes was awarded CHF 500,000.00 in July 2016 for a three-year project (Project IMAGO) to upgrade the collection and fully database 150,000 Palearctic Macrolepidoptera.

Preparation of the collection began in August 2016 with the removal of old cork-lined drawers and re-housing of specimens into new drawers with Plastazote® lined unit trays.  At the same time, five existing Lepidoptera collections (~200,000 specimens) as well as eight collections donated in 2016 (35,000 specimens) were integrated, and after culling specimens with poor quality data there remained a core Palaearctic Macrolepidoptera collection with about 165,000 specimens.  Our goal was to digitize and database approximately 50,000 specimens per year for three years.

Fig. 1.  ETH Entomological Collection imaging workstation.

We finished building our imaging workstation in December 2016.  Specifications include a Canon EOS 760D camera body with Canon EFS 60mm f/2.8 Macro lens and remote shutter control, Kaiser camera stand and four Kaiser RB 218N HF 5464 lights (5400 Kelvin) (see Figs. 1 & 2).  DataShot® software (developed by Paul J. Morris at the MCZ, Harvard University) was adapted to the ETH environment with a few modifications.  The DataShot® system is ideal for medium sized insect collections or sub-collections where it is feasible to fully transcribe pin label data from each specimen.  Briefly, the concept is to photograph the pin labels together with the specimen; include a unique identifier (UID) tag and a taxonomic data label, each encoded with QR codes that are machine read directly into the database fields.  Thus, records are searchable in the database by species names, etc., as soon as the images are processed.  Each record in the database displays the image so data entry is done hands-free directly from the high-resolution images.  Further information on DataShot® is available here.

Fig. 2.  Detail of the imaging carriage.

Imaging and processing of images into our database commenced in March 2017 using part-time students and a fixed-term full-time civil service employee.  We quickly increased our imaging rates as we refined the protocols and by the end of April 2017 we were imaging up to 1000 specimens per day at our single workstation.  However, with students' exams and holidays, etc., and some upstream constraints such as collection preparation, we processed 72,000 specimen records (82,000 images including labels with data on both sides) in our first seven months of operation.  We expect to maintain the rate of 10,000 records per month for the foreseeable future.

Imaging station personnel are trained over about four or five imaging sessions and follow a set protocol at the imaging station to minimize the potential for mistakes.  Special tools are provided to reduce handling times and the risk of specimen damage, e.g. see page 18 of the SPNHC Newsletter for note on modified forceps.  Most imaging personnel routinely repair loose wings and abdomens before imaging, cull specimens with incomplete data (to be confirmed by the supervisor), and require little or no supervision.

Imaging workstation protocols:

  • We photograph only the uppermost side of each pinned specimen since our purpose is to database the collection, not to provide an identification tool.  The extra work involved would double our imaging times, but see also the next point.
  • Accurate species determinations are crucial, so we use only recognized experts to confirm the identifications of each taxon prior to digitization.  
  • Other workstation protocols include:
    • Photograph both sides of pin labels if data are recorded on each side.
    • Order of labels on the pin is maintained in the image.
    • Final label is always the UID tag, which is Laser printed both sides on 826 Speci-Mark® archival paper that is firm on the pin and reduces the risk of old labels falling off.

Transcription of pin label data into the database had a slower start, but by early August 2017 we debugged the software and made some changes to suit our particular requirements.  Two of our data entry personnel proved to be very adept at transcribing data and our current transcription rate including geo-coordinates is less than 1.4 min per record over a four-hour shift (n=3500 records).  As a result of the speed and quality of data entry, we intend to keep this operation "in-house" for now rather than farm it to community sourcing or other third parties.  We expect new personnel will achieve similar data-entry rates with additional modifications to DataShot® and as we continue to develop and expand our important resources for data-entry, described below.  Time will tell!  Data-entry screen is shown in Fig. 3.

Fig. 3.  Screenshot of DataShot® data-entry page/specimen record.  Many fields are linked to database records and present auto-fill options.  Each database record features the image of primary data.

Data entry personnel share the same office as the collection manager/digitization manager and other curatorial personnel and are encouraged to ask questions about data fields and the interpretation of pin label data.  Questions quickly taper off as data entry personnel become more experienced.

Fig 4.  Spreadsheet showing recurring verbatim locality data and geo-coordinates. Currently 1837 records.

Important resources for data entry personnel:

  • A spreadsheet with searchable lists of recurring symbols, abbreviations and other label markings together with a description of their meaning and in which fields data should be entered.
  • Collector lists with verbatim and full name, where they lived (& collected); birth & death dates; and other relevant information to standardize collector names and assist in interpreting label data.
  • Searchable list of verbatim place names with the current spelling, canton/state, country and geo-reference (WGS84) coordinates that can be quickly copied into the relevant fields.  The lists are updated daily by data entry personnel (Fig. 4).
  • Bookmarked web-based resources such as the European Geoportals Swiss Gazetteer map where coordinates for previously unrecorded place names, both obsolete and new can be determined and copied (Fig. 5).
  • Hard copies of collector's journals from our archive and society membership lists of Swiss and other European collectors are on hand and often help to resolve difficult labels.

Fig. 5.  Swiss Gazetteer map showing geo-coordinates for a particular location.

The DataShot® system of imaging specimens and labels, then transcribing data from the images into the database is very efficient, accurate and specimen friendly.  But equally important is the quality of personnel.  All our students are well paid and motivated.  Most have been working in the collection for a year or more and all were keen to transition onto specimen imaging when we started.  All of our civil service employees brought additional skills that helped us refine our protocols or solve software problems.  Also, we have several volunteer retiree entomologists that perform important tasks in the workflow.  Our low employee turnover, an inclusive working environment and mixing the daily work routine make for more efficient and highly motivated employees.  The importance of good quality staff cannot be overstated.

We were fortunate that Paul Morris could answer some software questions during our setup and we have excellent support from ETH Collections & Archives, especially in providing extra funds for materials and specialist entomologists so that we can maintain our current production rates.

A video showing our digitization process can be viewed here:

Contact Rod Eastwood