Augmenting OCR: Difference between revisions

m (adding more about the workingn group's efforts. removing language about the stub)
No edit summary
 
(46 intermediate revisions by 2 users not shown)
Line 1: Line 1:
This is a community derived encyclopedia of information about the Augmenting OCR working group. Members of this working group and the community are encouraged to work together to develop content and deliverables that will serve the broader digitization community.
[[Category:Workflow]]
[[Category:Protocol]]
[[Category:OCR]]
[[Category:aOCR]]
[[Category:Digitization]]


=== Augmenting OCR Working Group (A-OCR) Overview ===
This is a community derived encyclopedia of information about the Augmenting OCR working group and their efforts. Members of this working group and the community are encouraged to work together to develop content and deliverables that will serve the broader digitization community.
=== A-OCR Goals ===
We are focusing the efforts of the working group to put together materials to help the community get more from their OCR strategies. Topics we are gathering material on include:  
We are focusing the efforts of the working group to put together materials to help the community get more from their OCR strategies. Topics we are gathering material on include:  


Line 7: Line 16:
*reporting findings after working with real image data and programmers to improve parsing of OCR output.<br>  
*reporting findings after working with real image data and programmers to improve parsing of OCR output.<br>  
*lists of OCR software currently being utilized by the natural history collections community.<br>
*lists of OCR software currently being utilized by the natural history collections community.<br>
*documenting and facilitating new uses for OCR output.<br>
=== Working Group History ===
The Augmenting OCR working group formed from the first iDigBio Summit in 2011 out of a community consensus wish to attempt improving OCR output and algorithm tools to speed up the digitization process. Initial group members were suggested by those present at Summit 2011.
Our current working group met, in person, at a [https://www.idigbio.org/wiki/index.php/IDigBio_Augmenting_OCR_Workshop workshop in October 2012] to learn more about each other's use of OCR and OCR output, plan a February 2013 hackathon, develop content for our iSchools 2013 iConference workshop panel, and learn more about the latest developments in handwriting recognition. We put together iDigBio's first[http://tinyurl.com/idigbioaocrhackathon hackathon] and presented our working group plans at [http://www.ischools.org/iConference13/workshop11/v iConference13]. In December of 2013, at the [https://www.idigbio.org/content/citscribe-hackathon iDigBio CITScribe hackathon], the aOCR working group built some working demos of visualization tools that facilitate searching of OCR output. These tools present ways to create record sets for citizen scientists, researchers, and data validators too. A recorded demonstration webinar: [https://www.idigbio.org/content/demo-webinar-visualize-text-data-using-ocr-output Visualize Text Data From OCR Output] is set up for 22 January 2014 to spread the word about how these tools fit in a digitization workflow.
=== OCR Related Materials ===


Check out the following pages. We welcome your input!<br>  
Check out the following pages. We welcome your input!<br>  
Line 13: Line 30:
*[[Technical Issues]]  
*[[Technical Issues]]  
*[[OCR / NLP Workflows]]
*[[OCR / NLP Workflows]]
*[[Digitization Projects Using OCR / ML / NLP]]


= Augmenting OCR Events, Outreach and Education  =
*Workshops
**[[IDigBio Augmenting OCR Workshop|iDigBio Augmenting OCR Workshop]] - Our working group is meeting in Gainesville, Florida, October 1 - 2, 2012. We've put together an exciting and challenging meeting agenda.
*[[iSchools2013| iSchools Conference 2013]]
**[[iConference 2013 iDigBio AOCR WG Wiki]]
**The iDigBio AOCR working group successfully proposed a half-day workshop put together by members of our working group, see [http://www.ischools.org/iConference13/workshop11/ Help iDigBio Reveal Hidden Data: iDigBio Augmenting OCR Working Group Needs You!], February 12 in Fort Worth, Texas.
*Hackathon
**[[2013 AOCR Hackathon Wiki]]
**Our working group is planning a [http://tinyurl.com/aocrHack Hackathon], concurrent with the above iSchools workshop to reach out beyond our natural history collections boarders for those with skills needed to improve and enhance our existing strategies for using OCR of images and parsing OCR output. The Hackathon is scheduled for February 13 - 14 at the Botanical Research Institute of Texas (BRIT). Members of the natural history community will be invited to participate and a general call will go out to the broader community for interested applicants in early to mid November of 2013.
**[http://tinyurl.com/iDigBioAOCRHackathon Application to Participate in the iDigBio 2013 Hackathon]
*[https://www.idigbio.org/content/citscribe-hackathon iDigBio CITScribe Hackathon]
**The iDigBio AOCR working group participated in iDigBio's second hackathon, the [https://www.idigbio.org/content/citscribe-hackathon iDigBio CITScribe Hackathon]. Members of the aOCR working group, Andrea Matsunaga, Jason Best, William Ulate, Reed Beaman, and Deborah Paul were joined by Miao Chen (Syracuse University, currently Visiting Scholar at Indiana University's Data to Insight Center) and [http://botany.si.edu/staff/staffpage.cfm?thisName=68| Sylvia Orli], Information Management & Research Support, Botany Department at the Smithsonian.
= Links =
*[http://idigbio.adobeconnect.com/augmentocr/ Augmenting OCR Adobe Connect Meeting Space]
*[https://idigbio.adobeconnect.com/admin/content?sco-id=1164702751&tab-id=1130716097 Adobe Connect Recorded Content]
*[https://www.idigbio.org/wiki/index.php/IDigBio_Workshops All iDigBio Workshops in Summary]
----
Please let us know if you need assistance modifying this page: [https://www.idigbio.org/contact/Website_feedback iDigBio Help Desk]  
Please let us know if you need assistance modifying this page: [https://www.idigbio.org/contact/Website_feedback iDigBio Help Desk]  


Also, if you would like to learn more about wiki syntax: [http://meta.wikimedia.org/wiki/Help:Wikitext_examples Mediawiki Wikitext Examples]  
Also, if you would like to learn more about wiki syntax: [http://meta.wikimedia.org/wiki/Help:Wikitext_examples Mediawiki Wikitext Examples]
 
<br>

Latest revision as of 17:59, 18 June 2014


Augmenting OCR Working Group (A-OCR) Overview

This is a community derived encyclopedia of information about the Augmenting OCR working group and their efforts. Members of this working group and the community are encouraged to work together to develop content and deliverables that will serve the broader digitization community.

A-OCR Goals

We are focusing the efforts of the working group to put together materials to help the community get more from their OCR strategies. Topics we are gathering material on include:

  • known effective practices for getting the most from any OCR software.
  • known issues that hinder good (useful) OCR output.
  • reporting findings after working with real image data and programmers to improve parsing of OCR output.
  • lists of OCR software currently being utilized by the natural history collections community.
  • documenting and facilitating new uses for OCR output.

Working Group History

The Augmenting OCR working group formed from the first iDigBio Summit in 2011 out of a community consensus wish to attempt improving OCR output and algorithm tools to speed up the digitization process. Initial group members were suggested by those present at Summit 2011.

Our current working group met, in person, at a workshop in October 2012 to learn more about each other's use of OCR and OCR output, plan a February 2013 hackathon, develop content for our iSchools 2013 iConference workshop panel, and learn more about the latest developments in handwriting recognition. We put together iDigBio's firsthackathon and presented our working group plans at iConference13. In December of 2013, at the iDigBio CITScribe hackathon, the aOCR working group built some working demos of visualization tools that facilitate searching of OCR output. These tools present ways to create record sets for citizen scientists, researchers, and data validators too. A recorded demonstration webinar: Visualize Text Data From OCR Output is set up for 22 January 2014 to spread the word about how these tools fit in a digitization workflow.

OCR Related Materials

Check out the following pages. We welcome your input!

Augmenting OCR Events, Outreach and Education

  • Workshops
    • iDigBio Augmenting OCR Workshop - Our working group is meeting in Gainesville, Florida, October 1 - 2, 2012. We've put together an exciting and challenging meeting agenda.
  • Hackathon
    • 2013 AOCR Hackathon Wiki
    • Our working group is planning a Hackathon, concurrent with the above iSchools workshop to reach out beyond our natural history collections boarders for those with skills needed to improve and enhance our existing strategies for using OCR of images and parsing OCR output. The Hackathon is scheduled for February 13 - 14 at the Botanical Research Institute of Texas (BRIT). Members of the natural history community will be invited to participate and a general call will go out to the broader community for interested applicants in early to mid November of 2013.
    • Application to Participate in the iDigBio 2013 Hackathon
  • iDigBio CITScribe Hackathon
    • The iDigBio AOCR working group participated in iDigBio's second hackathon, the iDigBio CITScribe Hackathon. Members of the aOCR working group, Andrea Matsunaga, Jason Best, William Ulate, Reed Beaman, and Deborah Paul were joined by Miao Chen (Syracuse University, currently Visiting Scholar at Indiana University's Data to Insight Center) and Sylvia Orli, Information Management & Research Support, Botany Department at the Smithsonian.

Links


Please let us know if you need assistance modifying this page: iDigBio Help Desk

Also, if you would like to learn more about wiki syntax: Mediawiki Wikitext Examples