Hackathon Challenge: Difference between revisions

m
Line 15: Line 15:
**This set will also be used for evaluation of performance of parsing algorithms. *Overfitting is a potential problem so at the time of the hackathon we may provide additional testing records for evaluation.
**This set will also be used for evaluation of performance of parsing algorithms. *Overfitting is a potential problem so at the time of the hackathon we may provide additional testing records for evaluation.


== Scope ==
There are several potential types of input to the parsing algorithms. The most basic form of input is OCR text in UTF-8 format from multiple engines. There may optionally be OCR with exact spatial information about the location of characters on the original image. This will allow some algorithms to exploit spatial information to identify elements. This format is, however, not a main focus for this hackathon. Also, those wishing to pursue other goals such as image segmentation, finding specific elements, or improving usability & user interfaces to the OCR and parsing tools are encouraged to do so and report back to the group at the hackathon.
There are several potential types of input to the parsing algorithms. The most basic form of input is OCR text in UTF-8 format from multiple engines. There may optionally be OCR with exact spatial information about the location of characters on the original image. This will allow some algorithms to exploit spatial information to identify elements. This format is, however, not a main focus for this hackathon. Also, those wishing to pursue other goals such as image segmentation, finding specific elements, or improving usability & user interfaces to the OCR and parsing tools are encouraged to do so and report back to the group at the hackathon.


Back to the [[2013 AOCR Hackathon Wiki| Hackathon Wiki]]
Back to the [[2013 AOCR Hackathon Wiki| Hackathon Wiki]]
4,707

edits