4,707
edits
m (→Parameters) |
|||
Line 31: | Line 31: | ||
*Sample of what you will see there for Set 1 (LBCC TCN lichen bryophyte packet labels): | *Sample of what you will see there for Set 1 (LBCC TCN lichen bryophyte packet labels): | ||
:::; human hand-typed image data (no errors) into text file == gold.txt : sample: ~ | :::; human hand-typed image data (no errors) into text file == gold.txt : sample: ~/dataset/gold/outputs | ||
:::; human parses data from gold.txt files into gold csv file (darwin core fields) == gold.csv : sample: ~ | :::; human parses data from gold.txt files into gold csv file (darwin core fields) == gold.csv : sample: ~/dataset/gold/parsed | ||
:::; OCR (of choice, ABBYY, TESSERACT, GOCR/JOCR, OCRopus, Omnipage) run on these images = output to silver.txt files : sample: ~ | :::; OCR (of choice, ABBYY, TESSERACT, GOCR/JOCR, OCRopus, Omnipage) run on these images = output to silver.txt files : sample: ~/dataset/silver/outputs | ||
:::; human parses "dirty" OCR out of silver.txt in to same darwin core fields ==silver.csv : sample: ~/egilbert/dataset/silver/parsed | :::; human parses "dirty" OCR out of silver.txt in to same darwin core fields ==silver.csv : sample: ~/egilbert/dataset/silver/parsed | ||