4,713
edits
Line 11: | Line 11: | ||
:::; 4. (DL) '''Hybrids: How should we treat hybrids? Omit? List both names in the scientificName field?''' : For the scope of this hackathon, getting any taxon name from the OCR output and into the CSV file into the field aocr: verbatimScientificName is the goal. This could include the author. Concentrate on getting what's on the label captured. No farther parsing is required, but individuals wanting to go farther may certainly do so. There are inherent challenges here [more on this later] that require software taxanomic intelligence beyond this hackathon scope. | :::; 4. (DL) '''Hybrids: How should we treat hybrids? Omit? List both names in the scientificName field?''' : For the scope of this hackathon, getting any taxon name from the OCR output and into the CSV file into the field aocr: verbatimScientificName is the goal. This could include the author. Concentrate on getting what's on the label captured. No farther parsing is required, but individuals wanting to go farther may certainly do so. There are inherent challenges here [more on this later] that require software taxanomic intelligence beyond this hackathon scope. | ||
5. (DL) '''Probable Corrections: When information is not certain (i.e. recordedBy, scientificName), is it better to guess or to omit?''' aocr:verbatimScientificName preserves the original text as captured from OCR output, The field dwc:scientificName can contain corrections. Corrections are outside the scope of what we should focus on for evaluation in the hackathon, but very good topics for discussion at the hackathon. Gold CSV standard (human parsed) will retain EXACTLY the characters as seen on the label (as best as a human can read the text). Retain typos as seen on the gold standard label. For the silver you can attempt to correct. | :::; 5. (DL) '''Probable Corrections: When information is not certain (i.e. recordedBy, scientificName), is it better to guess or to omit?''' : Answer: '''aocr:verbatimScientificName''' preserves the original text as captured from OCR output, The field''' dwc:scientificName''' can contain corrections. Corrections are outside the scope of what we should focus on for evaluation in the hackathon, but very good topics for discussion at the hackathon. Gold CSV standard (human parsed) will retain EXACTLY the characters as seen on the label (as best as a human can read the text). Retain typos as seen on the gold standard label. For the silver you can attempt to correct. | ||
6. (DL) '''Minimum Threshold of Results: What is the minimum data required for parsing results to be output? All the Priority 1 fields? I'm just suggesting that lack of a file might be better than an empty or inadequate file. What are the criteria?''' Bad answers (garbled output) are worse than no answers (blank) from a confusion matrix perspective. | :::; 6. (DL) '''Minimum Threshold of Results: What is the minimum data required for parsing results to be output? All the Priority 1 fields? I'm just suggesting that lack of a file might be better than an empty or inadequate file. What are the criteria?''' : Answer: Bad answers (garbled output) are worse than no answers (blank) from a confusion matrix perspective. | ||
Answer: csv file with a blank data line, perhaps only the barcode was readable, is a good strategy for getting a better score from the confusion matrix if the parsed output is otherwise garbled or empty. | Answer: csv file with a blank data line, perhaps only the barcode was readable, is a good strategy for getting a better score from the confusion matrix if the parsed output is otherwise garbled or empty. | ||
7. (DP) '''Hackathon Wiki: Where do I find out more about the overall hackathon?''' | :::; 7. (DP) '''Hackathon Wiki: Where do I find out more about the overall hackathon?''' : Answer: go to the [[2013 AOCR Hackathon Wiki]] pages. | ||
Back to the [[2013 AOCR Hackathon Wiki]] | Back to the [[2013 AOCR Hackathon Wiki]] |