Hackathon Challenge: Difference between revisions

Jump to navigation Jump to search
m
Line 16: Line 16:


== Scope ==
== Scope ==
There are several potential types of input to the parsing algorithms.  
*There are several potential types of input to the parsing algorithms.  
*The most basic form of input is OCR text in UTF-8 format from multiple engines.  
**The most basic form of input is OCR text in UTF-8 format from multiple engines.  
*There may optionally be OCR with exact spatial information about the location of characters on the original image.  
**There may optionally be OCR with exact spatial information about the location of characters on the original image.  
**This will allow some algorithms to exploit spatial information to identify elements. This format is, however, not a main focus for this hackathon.  
***This will allow some algorithms to exploit spatial information to identify elements. This format is, however, not a main focus for this hackathon.  
**Also, those wishing to pursue other goals such as image segmentation, finding specific elements, or improving usability & user interfaces to the OCR and parsing tools are encouraged to do so and report back to the group at the hackathon.
**Also, those wishing to pursue other goals such as image segmentation, finding specific elements, or improving usability & user interfaces to the OCR and parsing tools are encouraged to do so and report back to the group at the hackathon.


Back to the [[2013 AOCR Hackathon Wiki| Hackathon Wiki]]
Back to the [[2013 AOCR Hackathon Wiki| Hackathon Wiki]]
4,713

edits

Navigation menu