4,713
edits
m (→Scope) |
m (→Scope) |
||
Line 16: | Line 16: | ||
== Scope == | == Scope == | ||
There are several potential types of input to the parsing algorithms. | *There are several potential types of input to the parsing algorithms. | ||
*The most basic form of input is OCR text in UTF-8 format from multiple engines. | **The most basic form of input is OCR text in UTF-8 format from multiple engines. | ||
*There may optionally be OCR with exact spatial information about the location of characters on the original image. | **There may optionally be OCR with exact spatial information about the location of characters on the original image. | ||
**This will allow some algorithms to exploit spatial information to identify elements. This format is, however, not a main focus for this hackathon. | ***This will allow some algorithms to exploit spatial information to identify elements. This format is, however, not a main focus for this hackathon. | ||
**Also, those wishing to pursue other goals such as image segmentation, finding specific elements, or improving usability & user interfaces to the OCR and parsing tools are encouraged to do so and report back to the group at the hackathon. | **Also, those wishing to pursue other goals such as image segmentation, finding specific elements, or improving usability & user interfaces to the OCR and parsing tools are encouraged to do so and report back to the group at the hackathon. | ||
Back to the [[2013 AOCR Hackathon Wiki| Hackathon Wiki]] | Back to the [[2013 AOCR Hackathon Wiki| Hackathon Wiki]] |