Known OCR, ML, NLP Issues: Difference between revisions

From iDigBio
Jump to navigation Jump to search
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
=== Specific Issues Needing Work ===
=== Specific Issues Needing Work ===
:::how to get OCR to ignore a map (reduce OCR confusion)
This page is meant to serve as an ongoing list of known topics where work is needed that would improve things like OCR output, overall parsing results, and meaningful data set creation for digitization and data transcription by a human-in-the-loop.
:::... and ___ present a challenge and confuse OCR and parsing.
 
:::figure out an algorithm that would separate images into sets with no handwriting, little handwriting (mostly text typed or printed), lots of handwriting
Please add to the list.
 
<ol>
<li>how to get OCR to ignore a map (reduce OCR confusion)</li>
<li>... and ___ present a challenge and confuse OCR and parsing.</li>
<li>figure out an algorithm that would separate images into sets with no handwriting, little handwriting (mostly text typed or printed), lots of handwriting</li>
<li>hi-tech parsers need user interfaces</li>
<li>setting up a service-based architecture, see A[[OCR SaaS]]</li>
</ol>
 
If you have one or more methods worked out to address a given issue, please create a link from here to a page explaining your strategy and use examples where possible.
 
Back to the [[2013 AOCR Hackathon Wiki|Hackathon Wiki]]

Latest revision as of 18:30, 31 January 2013

Specific Issues Needing Work

This page is meant to serve as an ongoing list of known topics where work is needed that would improve things like OCR output, overall parsing results, and meaningful data set creation for digitization and data transcription by a human-in-the-loop.

Please add to the list.

  1. how to get OCR to ignore a map (reduce OCR confusion)
  2. ... and ___ present a challenge and confuse OCR and parsing.
  3. figure out an algorithm that would separate images into sets with no handwriting, little handwriting (mostly text typed or printed), lots of handwriting
  4. hi-tech parsers need user interfaces
  5. setting up a service-based architecture, see AOCR SaaS

If you have one or more methods worked out to address a given issue, please create a link from here to a page explaining your strategy and use examples where possible.

Back to the Hackathon Wiki