4,711
edits
Line 25: | Line 25: | ||
**user name and password given to you at our first meeting and via email. | **user name and password given to you at our first meeting and via email. | ||
*Sample of what you will see there for Set 1 (LBCC TCN lichen bryophyte packet labels): | *Sample of what you will see there for Set 1 (LBCC TCN lichen bryophyte packet labels): | ||
:::; human hand-parses the image (no errors) into a text file == gold.txt : sample: ~/egilbert/dataset/gold/outputs | |||
:::; human (parses) gets the data out of the gold.txt files into a csv file (darwin core fields) == gold.csv : sample: ~/egilbert/dataset/gold/parsed | |||
human (parses) gets the data out of the gold.txt files into a csv file (darwin core fields) == gold.csv | :::; OCR (of choice, ABBYY, TESSERACT, GOCR/JOCR, OCRopus, Omnipage) run on these images = output to silver.txt files : sample: ~/egilbert/dataset/silver/outputs | ||
:::; 3a. human (parses) the "dirty" OCR out of these silver.txt in to darwin core fields ==silver.csv : sample: ~/egilbert/dataset/silver/parsed</pre> | |||
OCR (of choice, ABBYY, TESSERACT, GOCR/JOCR, OCRopus, Omnipage) run on these images = output to silver.txt files | |||
3a. human (parses) the "dirty" OCR out of these silver.txt in to darwin core fields ==silver.csv | |||
== Parameters == | == Parameters == |