IConference 2013 iDigBio AOCR WG Wiki: Difference between revisions

m
Line 30: Line 30:


===Poster===
===Poster===
::''':Improving the Character of Optical Character Recognition (OCR): iDigBio Augmenting OCR Working Group Seeks Collaborators and Strategies to Improve OCR Output and Parsing of OCR Output . . .'''
:::'''Improving the Character of Optical Character Recognition (OCR): iDigBio Augmenting OCR Working Group Seeks Collaborators and Strategies to Improve OCR Output and Parsing of OCR Output . . .'''
:::There are an estimated 2 – 3 billion museum specimens world – wide (OECD 1999, Ariño 2010). In an effort to increase the research value of their collections, institutions across the U. S. have been seeking new ways to cost effectively transcribe the label information associated with these specimen collections. Current digitization methods are still relatively slow, labor-intensive, and therefore expensive. New methods, such as optical character recognition (OCR), natural language processing, and human-in-the-loop assisted parsing are being explored to reduce these costs. The National Science Foundation (NSF), through the Advancing Digitization of Biodiversity Collections (ADBC) program, funded Integrated Digitized Biocollections (iDigBio) in 2011 to create a Home Uniting Biodiversity Collections (HUB) cyberinfrastructure to aggregate and collectively integrate specimen data and find ways to digitize specimen data faithfully and faster and disseminate the knowledge of how to achieve this. The iDigBio Augmenting OCR Working Group is part of this national effort.
:::There are an estimated 2 – 3 billion museum specimens world – wide (OECD 1999, Ariño 2010). In an effort to increase the research value of their collections, institutions across the U. S. have been seeking new ways to cost effectively transcribe the label information associated with these specimen collections. Current digitization methods are still relatively slow, labor-intensive, and therefore expensive. New methods, such as optical character recognition (OCR), natural language processing, and human-in-the-loop assisted parsing are being explored to reduce these costs. The National Science Foundation (NSF), through the Advancing Digitization of Biodiversity Collections (ADBC) program, funded Integrated Digitized Biocollections (iDigBio) in 2011 to create a Home Uniting Biodiversity Collections (HUB) cyberinfrastructure to aggregate and collectively integrate specimen data and find ways to digitize specimen data faithfully and faster and disseminate the knowledge of how to achieve this. The iDigBio Augmenting OCR Working Group is part of this national effort.


4,707

edits