Hackathon Challenge: Difference between revisions

Hackathon Challenge (view source)

12 bytes added , 11 January 2013

m

4,707

edits

@@ Line 15: / Line 15: @@
 **This set will also be used for evaluation of performance of parsing algorithms. *Overfitting is a potential problem so at the time of the hackathon we may provide additional testing records for evaluation.
+== Scope ==
 There are several potential types of input to the parsing algorithms. The most basic form of input is OCR text in UTF-8 format from multiple engines. There may optionally be OCR with exact spatial information about the location of characters on the original image. This will allow some algorithms to exploit spatial information to identify elements. This format is, however, not a main focus for this hackathon. Also, those wishing to pursue other goals such as image segmentation, finding specific elements, or improving usability & user interfaces to the OCR and parsing tools are encouraged to do so and report back to the group at the hackathon.
 Back to the [[2013 AOCR Hackathon Wiki| Hackathon Wiki]]