iConference2013: Augmenting optical character recognition (OCR) for improved digitization: Strategies to access scientific data in natural history collections

TitleiConference2013: Augmenting optical character recognition (OCR) for improved digitization: Strategies to access scientific data in natural history collections
Publication TypeConference Paper
Year of Publication2013
AuthorsPaul, Deborah L., and P. Heidorn Bryan
Conference NameiConference 2013
Date Published02/2013
PublisheriSchools
Conference LocationFt. Worth, Texas
KeywordsEF-1115210, iConference2013, iDigBio, information analysis, information organization, information retrieval, information services, machine language, Natural Language, OCR, qualitative data analysis, research methods
AbstractThe Augmenting OCR Working Group (A-OCR WG) at Integrated Digitized Biocollections (iDigBio) seeks to improve community OCR strategies and algorithms for faster, better parsing of OCR output derived from valuable data on natural history collection specimen labels. This task is exceedingly difficult because museum labels are often annotated, and vary in content, form and font. Under the National Science Foundation's (NSF) Advancing Digitization of Biological Collections (ADBC) program, iDigBio is building a cyberinfrastructure to aggregate quality data from museum specimens housed in collections across the United States for use by researchers, educators, environmentalists and the public. Since March of 2012, the A-OCR WG formed from community consensus to begin its role in this endeavor, defining reachable goals including setting up a hackathon concurrent with iConference 2013. This paper reports on the definition of some key problems identified by the A-OCR WG since these science problems will drive research and cyberinfrastructure development.
URLhttps://www.ideals.illinois.edu/bitstream/handle/2142/39427/266.pdf?sequence=4
DOI10.9776/13266