4,707
edits
m (→Parameters) |
|||
Line 24: | Line 24: | ||
**host server name: aocr1.acis.ufl.edu | **host server name: aocr1.acis.ufl.edu | ||
**user name and password given to you at our first meeting and via email. | **user name and password given to you at our first meeting and via email. | ||
*Sample of what you will see there: | *Sample of what you will see there for Set 1 (LBCC TCN lichen bryophyte packet labels): | ||
<pre>human hand-parses the image (no errors) into a text file == gold.txt | <pre>human hand-parses the image (no errors) into a text file == gold.txt | ||
sample: | sample: ~/egilbert/dataset/gold/outputs | ||
human (parses) gets the data out of the gold.txt files into a csv file (darwin core fields) == gold.csv | human (parses) gets the data out of the gold.txt files into a csv file (darwin core fields) == gold.csv | ||
sample: | sample: ~/egilbert/dataset/gold/parsed | ||
OCR (of choice, ABBYY, TESSERACT, GOCR/JOCR, OCRopus, Omnipage) run on these images = output to silver.txt files | OCR (of choice, ABBYY, TESSERACT, GOCR/JOCR, OCRopus, Omnipage) run on these images = output to silver.txt files | ||
sample: | sample: ~/egilbert/dataset/silver/outputs | ||
3a. human (parses) the "dirty" OCR out of these silver.txt in to darwin core fields ==silver.csv | 3a. human (parses) the "dirty" OCR out of these silver.txt in to darwin core fields ==silver.csv | ||
sample: | sample: ~/egilbert/dataset/silver/parsed</pre> | ||
== Parameters == | == Parameters == |