Hackathon FAQ: Difference between revisions

m
 
(6 intermediate revisions by the same user not shown)
Line 13: Line 13:
:::; 5. (DL) '''Probable Corrections: When information is not certain (i.e. recordedBy, scientificName), is it better to guess or to omit?''' : Answer: '''aocr:verbatimScientificName''' preserves the original text as captured from OCR output, The field''' dwc:scientificName''' can contain corrections. Corrections are outside the scope of what we should focus on for evaluation in the hackathon, but very good topics for discussion at the hackathon. Gold CSV standard (human parsed) will retain EXACTLY the characters as seen on the label (as best as a human can read the text). Retain typos as seen on the gold standard label. For the silver you can attempt to correct.
:::; 5. (DL) '''Probable Corrections: When information is not certain (i.e. recordedBy, scientificName), is it better to guess or to omit?''' : Answer: '''aocr:verbatimScientificName''' preserves the original text as captured from OCR output, The field''' dwc:scientificName''' can contain corrections. Corrections are outside the scope of what we should focus on for evaluation in the hackathon, but very good topics for discussion at the hackathon. Gold CSV standard (human parsed) will retain EXACTLY the characters as seen on the label (as best as a human can read the text). Retain typos as seen on the gold standard label. For the silver you can attempt to correct.


:::; 6. (DL) '''Minimum Threshold of Results: What is the minimum data required for parsing results to be output?  All the Priority 1 fields? I'm just suggesting that lack of a file might be better than an empty or inadequate file.  What are the criteria?''' : Answer: Bad answers (garbled output) are worse than no answers (blank) from a confusion matrix perspective.
:::; 6. (DL) '''Minimum Threshold of Results: What is the minimum data required for parsing results to be output?  All the Priority 1 fields? I'm just suggesting that lack of a file might be better than an empty or inadequate file.  What are the criteria?''' : Answer: Bad answers (garbled output) are worse than no answers (blank) from a confusion matrix perspective. A csv file with a blank data line, perhaps only the barcode was readable, is a good strategy for getting a better score from the confusion matrix if the parsed output is otherwise garbled or empty.
Answer: csv file with a blank data line, perhaps only the barcode was readable, is a good strategy for getting a better score from the confusion matrix if the parsed output is otherwise garbled or empty.


:::; 7. (DP) '''Hackathon Wiki: Where do I find out more about the overall hackathon?''' : Answer: go to the [[2013 AOCR Hackathon Wiki]] pages.
:::; 7. (DP) '''Hackathon Wiki: Where do I find out more about the overall hackathon?''' : Answer: go to the [[2013 AOCR Hackathon Wiki]] pages.
:::; 8. (DP) '''First Meeting Notes: Where are the Google Notes from the first virtual hackaton meeting with participants on Friday 11 Jan 2013?''' : Answer: see this [http://tinyurl.com/aocrhackmeet1 Google Doc]
:::; 9. (DP) '''Participants: Where is the List of Hackathon Participants?''' : Answer: see [[2013_Hackathon_Participants]]
:::;10 (DP) '''LABELX Paser: How to I access the LABELX Parser software?''' : Answer: Go to the AOCR VM and you'll find the software and related documentation in the directory  ~/software/labelx/workspace.zip


Back to the [[2013 AOCR Hackathon Wiki]]
Back to the [[2013 AOCR Hackathon Wiki]]
4,707

edits