Workshop at iSchools Conference 2013
The iDigBio Augmenting OCR (AOCR) Working Group has put together two upcoming events as part of their efforts to update community knowledge about what is possible with Optical Character Recognition (OCR) and Natural Language Processing (NLP) and find collaborators to spur further development of effective OCR and NLP strategies in digitization. First, the AOCR working group successfully submitted a proposal to the 2013 iSchools Conference for a half-day workshop entitled: "Help iDigBio Reveal Hidden Data: iDigBio Augmenting OCR Working Group Needs You." Several key members of the working group, via panel presentations, plan to introduce the iSchools community to the scope and challenges of digitizing natural history collections with a focus on the use of OCR and NLP. Panel presentations planned encompass an all-inclusive world view including the curator, researcher, public user-interface, and aggregators. Using break-out groups, the AOCR working group hopes to find new insights and partners.
Second, in an effort to further define effective OCR and NLP strategies and better delineate challenges, the AOCR working group set up a hackathon, concurrent with the iSchools 2013 Conference. With herabarium sheet, bryophyte packet and entolomology label sets, programmer + end user teams will be presented with two challenges: 1) to use the OCR software of their choice to output the cleanest OCR possible and 2) to use NLP algorithms in an effort to parse the OCR output into Darwin Core fields. Participants' outputs will be compared and scored with results shared via the iDigBio Wiki.