5,887
edits
(46 intermediate revisions by 5 users not shown) | |||
Line 48: | Line 48: | ||
:::;Data set 2 Herbarium Sheet Images: 10000+ images in /home/aocr/datasets/herbs/inputs/raw | :::;Data set 2 Herbarium Sheet Images: 10000+ images in /home/aocr/datasets/herbs/inputs/raw | ||
::::5000 are from NYBG in home/aocr/sgottschalk_images.tar.gz | ::::5000 are from NYBG in home/aocr/sgottschalk_images.tar.gz | ||
:::;Data set 2 Herbarium Sheet OCR output text files for parsing: | :::;Data set 2 Herbarium Sheet OCR output text files for parsing: | ||
::::/home/aocr/datasets/herbs/outputs/gocr | ::::/home/aocr/datasets/herbs/outputs/gocr | ||
Line 54: | Line 54: | ||
::::/home/aocr/datasets/herbs/outputs/ocropus | ::::/home/aocr/datasets/herbs/outputs/ocropus | ||
::::/home/aocr/datasets/herbs/outputs/tesseract | ::::/home/aocr/datasets/herbs/outputs/tesseract | ||
:::;'''[ | :::;'''[[Media:01498198.jpg|SAMPLE IMAGE]]''' parsed in the SAMPLE CSV next. | ||
:::;'''[[Media:SampleCSV.jpg|SAMPLE PARSED CSV FILE]]''' to show '''column headers and values''' | |||
:::;Data set 3 Entomology Images: /home/aocr/datasets/ent/inputs/raw | :::;Data set 3 Entomology Images: /home/aocr/datasets/ent/inputs/raw | ||
Line 60: | Line 61: | ||
::::or see /home/aocr/oboyski_images.tar.gz | ::::or see /home/aocr/oboyski_images.tar.gz | ||
:::;Data set 3 Entomology OCR output ABBYY text files for parsing: /home/aocr/datasets/ent/outputs/abbyy | :::;Data set 3 Entomology OCR output ABBYY text files for parsing: /home/aocr/datasets/ent/outputs/abbyy | ||
=== [[Dataset Errata]] === | |||
*known / discovered errors in the .txt, .csv files as they are found. | |||
== Parameters == | == Parameters == |
edits