Hackathon Challenge: Difference between revisions

m
Line 9: Line 9:
== '''The Process''' ==
== '''The Process''' ==
For each of the three image data sets, 200 images were selected (hand-picked) for creating a human hand-parsed standard for metrics. Three different files have been created for each of these selected images.
For each of the three image data sets, 200 images were selected (hand-picked) for creating a human hand-parsed standard for metrics. Three different files have been created for each of these selected images.
; ''Perfect OCR text files'' : Hand-transcribed from each image, these text files represent faithfully (exactly) what is in the image and are supposed to reflect what the output would look like if the OCR understood all the data in the image (including the handwriting).
:::; ''Perfect OCR text files'' : Hand-transcribed from each image, these text files represent faithfully (exactly) what is in the image and are supposed to reflect what the output would look like if the OCR understood all the data in the image (including the handwriting).
; Gold CSV files : These Gold CSV files have darwin core element column headers and the data parsed into the appropriate column. Data to populate these Gold CSV files comes from the hand-transcribed gold text files.
:::; Gold CSV files : These Gold CSV files have darwin core element column headers and the data parsed into the appropriate column. Data to populate these Gold CSV files comes from the hand-transcribed gold text files.
; Silver CSV files : These Silver CSV files also have the same darwin core element column headers and the data parsed into the appropriate column. But, the data here is from the OCR "as is." The same data, with any OCR errors, from the same images is now captured and put into each silver CSV.
:::; Silver CSV files : These Silver CSV files also have the same darwin core element column headers and the data parsed into the appropriate column. But, the data here is from the OCR "as is." The same data, with any OCR errors, from the same images is now captured and put into each silver CSV.


== Parameters ==
== Parameters ==
4,707

edits