Hackathon Challenge: Difference between revisions

Jump to navigation Jump to search
 
(46 intermediate revisions by 5 users not shown)
Line 48: Line 48:
:::;Data set 2 Herbarium Sheet Images:  10000+ images in /home/aocr/datasets/herbs/inputs/raw
:::;Data set 2 Herbarium Sheet Images:  10000+ images in /home/aocr/datasets/herbs/inputs/raw
::::5000 are from NYBG in home/aocr/sgottschalk_images.tar.gz
::::5000 are from NYBG in home/aocr/sgottschalk_images.tar.gz
::::5000 are from BRIT
 
:::;Data set 2 Herbarium Sheet OCR output text files for parsing:
:::;Data set 2 Herbarium Sheet OCR output text files for parsing:
::::/home/aocr/datasets/herbs/outputs/gocr
::::/home/aocr/datasets/herbs/outputs/gocr
Line 54: Line 54:
::::/home/aocr/datasets/herbs/outputs/ocropus
::::/home/aocr/datasets/herbs/outputs/ocropus
::::/home/aocr/datasets/herbs/outputs/tesseract
::::/home/aocr/datasets/herbs/outputs/tesseract
:::;'''[https://www.idigbio.org/wiki/index.php/Media:SampleCSV.jpg SAMPLE PARSED CSV FILE]''' to show '''column headers and values'''
:::;'''[[Media:01498198.jpg|SAMPLE IMAGE]]''' parsed in the SAMPLE CSV next.
:::;'''[[Media:SampleCSV.jpg|SAMPLE PARSED CSV FILE]]''' to show '''column headers and values'''


:::;Data set 3 Entomology Images: /home/aocr/datasets/ent/inputs/raw
:::;Data set 3 Entomology Images: /home/aocr/datasets/ent/inputs/raw
Line 60: Line 61:
::::or see /home/aocr/oboyski_images.tar.gz
::::or see /home/aocr/oboyski_images.tar.gz
:::;Data set 3 Entomology OCR output ABBYY text files for parsing: /home/aocr/datasets/ent/outputs/abbyy
:::;Data set 3 Entomology OCR output ABBYY text files for parsing: /home/aocr/datasets/ent/outputs/abbyy
=== [[Dataset Errata]]  ===
*known / discovered errors in the .txt, .csv files as they are found.


== Parameters ==
== Parameters ==
5,887

edits

Navigation menu