IConference 2013 iDigBio AOCR WG Wiki: Difference between revisions

Jump to navigation Jump to search
m
Line 62: Line 62:


===Notes (short paper)===
===Notes (short paper)===
:::;Augmenting Optical Character Recognition (OCR) for Improved Digitization -- Strategies to Access Scientific Data in Natural History Collections. Deborah L Paul, P. Bryan Heidorn: Augmenting OCR Working Group (A-OCR WG) at Integrated Digitized Biocollections (iDigBio) seeks to improve community OCR strategies and algorithms for faster, better parsing of OCR output derived from valuable data on natural history collection specimen labels. This task is exceedingly difficult because museum labels are often annotated, and vary in content, form and font. Under the National Science Foundation's (NSF) Advancing Digitization of Biological Collections (ADBC) program, iDigBio is building a cyberinfrastructure to aggregate quality data from museum specimens housed in collections across the United States for use by researchers, educators, environmentalists and the public. Since March of 2012, the A-OCR WG formed from community consensus to begin its role in this endeavor, defining reachable goals including setting up a hackathon concurrent with iConference 2013. This paper reports on the definition of some key problems identified by the A-OCR WG since these science problems will drive research and cyberinfrastructure development.
:::;Paul, D., & Heidorn, P. B. (2013). Augmenting optical character recognition (OCR) for improved digitization: Strategies to access scientific data in natural history collections. iConference 2013 Proceedings (pp. 514-518). doi:10.9776/13266: Augmenting OCR Working Group (A-OCR WG) at Integrated Digitized Biocollections (iDigBio) seeks to improve community OCR strategies and algorithms for faster, better parsing of OCR output derived from valuable data on natural history collection specimen labels. This task is exceedingly difficult because museum labels are often annotated, and vary in content, form and font. Under the National Science Foundation's (NSF) Advancing Digitization of Biological Collections (ADBC) program, iDigBio is building a cyberinfrastructure to aggregate quality data from museum specimens housed in collections across the United States for use by researchers, educators, environmentalists and the public. Since March of 2012, the A-OCR WG formed from community consensus to begin its role in this endeavor, defining reachable goals including setting up a hackathon concurrent with iConference 2013. This paper reports on the definition of some key problems identified by the A-OCR WG since these science problems will drive research and cyberinfrastructure development.


===Alternative Event===
===Alternative Event===
4,707

edits

Navigation menu