IConference 2013 iDigBio AOCR WG Wiki: Difference between revisions

m
Line 21: Line 21:
===Panel Workshop ===
===Panel Workshop ===
:::Integrated Digitized Biodiversity Collections, iDigBio, is an initiative funded under the National Science Foundation's (NSF) Advancing Digitization of Biological Collections (ADBC) program set up to help natural history museums get specimen data for hundreds of millions of specimens out of drawers, off of labels, out of field notebooks, out of old publications and into integrated databases for everyone's use. The iDigBio Augmenting OCR Working Group needs your wisdom, knowledge and collaboration as part of our multi-faceted approach to improve OCR strategies and natural language processing (NLP) algorithms used in digitization. Our workshop panelists, five members of our working group, are eager to introduce the iSchools community to our challenges and get your input in our break-out sessions. Our research areas of interest include: image segmentation, autocorrection of typographical errors, semantic autocorrection, autonormalization, automated text segmentation, generating consensus records and user interfaces for these tasks. We seek your insights, collective experiences and partnership in order to find ways to improve the digitization process to create a national searchable online specimen-based data set that is fit-for-use by scientists and the public. Some ideas generated in this session may be implemented at the iDigBio hackathon being held at the Botanical Research Institute of Texas (BRIT) during the iConference.
:::Integrated Digitized Biodiversity Collections, iDigBio, is an initiative funded under the National Science Foundation's (NSF) Advancing Digitization of Biological Collections (ADBC) program set up to help natural history museums get specimen data for hundreds of millions of specimens out of drawers, off of labels, out of field notebooks, out of old publications and into integrated databases for everyone's use. The iDigBio Augmenting OCR Working Group needs your wisdom, knowledge and collaboration as part of our multi-faceted approach to improve OCR strategies and natural language processing (NLP) algorithms used in digitization. Our workshop panelists, five members of our working group, are eager to introduce the iSchools community to our challenges and get your input in our break-out sessions. Our research areas of interest include: image segmentation, autocorrection of typographical errors, semantic autocorrection, autonormalization, automated text segmentation, generating consensus records and user interfaces for these tasks. We seek your insights, collective experiences and partnership in order to find ways to improve the digitization process to create a national searchable online specimen-based data set that is fit-for-use by scientists and the public. Some ideas generated in this session may be implemented at the iDigBio hackathon being held at the Botanical Research Institute of Texas (BRIT) during the iConference.
:::==== Five Panelist's Talks====
==== Five Panelist's Talks====
::::::Deborah Paul Introducing iDigBio and the Augmenting OCR Working Group
::::::Deborah Paul Introducing iDigBio and the Augmenting OCR Working Group
:::Amanda Neill Digitization of biocollections: a grand challenge in scope, scale, and significance
 
:::Jason Best The Apiary Project:  a workflow for text extraction and parsing for herbarium specimens
::::::Amanda Neill Digitization of biocollections: a grand challenge in scope, scale, and significance
:::Edward Gilbert Symbiota: Creating an OCR and NLP enabled user interface and workflow to efficiently digitize 2.3 million lichen and bryophyte specimens
 
:::Bryan Heidorn HERBIS/LABELX: Machine Learning Approach to Parsing OCR Text
::::::Jason Best The Apiary Project:  a workflow for text extraction and parsing for herbarium specimens
:::John Mignault Linking Data: Biodiversity Heritage Library: supporting knowledge discovery from digitized content
 
::::::Edward Gilbert Symbiota: Creating an OCR and NLP enabled user interface and workflow to efficiently digitize 2.3 million lichen and bryophyte specimens
 
::::::Bryan Heidorn HERBIS/LABELX: Machine Learning Approach to Parsing OCR Text
 
::::::John Mignault Linking Data: Biodiversity Heritage Library: supporting knowledge discovery from digitized content


===Poster===
===Poster===
4,707

edits