SPNHC 2014: Progress in Digitization: Incorporating OCR into a digitisation and curation workflow

Thu, 2014-05-29 14:03 -- ellwood
Publication TypePresentation
Year of Publication2014
AuthorsHaston, Elspeth, Drinkwater Robyn, and Cubey Robert
Keywordsdigitization, optical character recognition, SPNHC 2014, SPNHC 2014: Progress in Digitization, workflow
AbstractThe digitisation of natural history collections is a priority for many institutes for reasons including opening access to users around the world, incorporation of specimen data in research, disaster planning, etc. However, digitisation of natural history specimens is expensive and labour intensive. For this reason, digitisation of specimens has moved towards minimal data capture and imaging as part of large scale processes. The workflow can be summarised in the following steps: 1) minimal curation; 2) attach barcode as a unique identifier; 3) minimal data entry; 4) image specimen; 5) additional data entry. There has been some investigation by institutes into the use of Optical Character Recognition (OCR) within the digitsation and curation workflow. The Royal Botanic Garden Edinburgh (RBGE) now routinely processes all specimen images through OCR software. The OCR process is integrated into the overall digitisation and curation workflow and has been used to speed up the process of adding data to over 100,000 specimens. The following additional steps have been incorporated into the workflow at RBGE: 4a) assess condition of specimen; 4b) process image through OCR software; 5a) additional curation. The incorporation of OCR into digitisation workflows is being explored by the Synthesys project funded by the European Union within Framework 7, and by iDigBio funded by the United States Government within the National Science Foundation programme. The work being carried out at the Royal Botanic Garden Edinburgh to integrate OCR into the digitisation and curation workflow is discussed as part of the work of Synthesys and iDigBio.