iDigBio
Published on iDigBio (https://www.idigbio.org)

Home > Software Options

Software Options

Forums: 
Augmenting OCR and NLP [1]

 
Tesseract Tesseract  [2] is a free software optical character recognition engine that capable to functioning on various operating systems. While numerous options are available (http://en.wikipedia.org/wiki/List_of_optical_character_recognition_software [3]), Tesseract is considered one of the more accurate free software OCR engines. The LBCC project (http://lbcc.limnology.wisc.edu/ [4]) has incorporated Tesseract as their central OCR engine mainly due to a commitment to open-source technologies, active Tesseract development, and the ease of incorporating the software into a web browser environment. However, LBCC would be interested in incorporating other OCR technologies into their workflow if shown to produce better output. 


Source URL:https://www.idigbio.org/content/software-options

Links
[1] https://www.idigbio.org/taxonomy/term/52 [2] http://en.wikipedia.org/wiki/Tesseract_(software) [3] http://en.wikipedia.org/wiki/List_of_optical_character_recognition_software [4] http://lbcc.limnology.wisc.edu/