Hackathon FAQ: Difference between revisions

m
Line 12: Line 12:


5.'''Probable Corrections: When information is not certain (i.e. recordedBy, scientificName), is it better to guess or to omit?''' aocr:verbatimScientificName preserves the original text as captured from OCR output, The field dwc:scientificName can contain corrections. Corrections are outside the scope of what we should focus on for evaluation in the hackathon, but very good topics for discussion at the hackathon. Gold CSV standard (human parsed) will retain EXACTLY the characters as seen on the label (as best as a human can read the text). Retain typos as seen on the gold standard label. For the silver you can attempt to correct.
5.'''Probable Corrections: When information is not certain (i.e. recordedBy, scientificName), is it better to guess or to omit?''' aocr:verbatimScientificName preserves the original text as captured from OCR output, The field dwc:scientificName can contain corrections. Corrections are outside the scope of what we should focus on for evaluation in the hackathon, but very good topics for discussion at the hackathon. Gold CSV standard (human parsed) will retain EXACTLY the characters as seen on the label (as best as a human can read the text). Retain typos as seen on the gold standard label. For the silver you can attempt to correct.
6.'''Minimum Threshold of Results: What is the minimum data required for parsing results to be output?  All the Priority 1 fields? I'm just suggesting that lack of a file might be better than an empty or inadequate file.  What are the criteria?''' Bad answers (garbled output) are worse than no answers (blank) from a confusion matrix perspective.
Answer: csv file with a blank data line, perhaps only the barcode was readable, is a good strategy for getting a better score from the confusion matrix if the parsed output is otherwise garbled or empty.
4,713

edits