OCR Resources: Difference between revisions

no edit summary
No edit summary
Line 15: Line 15:


*[http://en.wikipedia.org/wiki/Tesseract_(software) Tesseract] - Open source optical character recognition engine available under the Apache License, Version 2.0. Software is capable to functioning on various operating systems. Considered to be one of the more accurate OCR engines that are available under a free software license.
*[http://en.wikipedia.org/wiki/Tesseract_(software) Tesseract] - Open source optical character recognition engine available under the Apache License, Version 2.0. Software is capable to functioning on various operating systems. Considered to be one of the more accurate OCR engines that are available under a free software license.
**[http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseracticdar2007.pdf An Overview of the Tesseract OCR Engine] by Ray Smith at Google Inc.  
**[http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseracticdar2007.pdf An Overview of the Tesseract OCR Engine] by Ray Smith at Google Inc.  
**[[OCR Tips#Tesseract_tips|Tesseract tips]]
**[[OCR Tips#Tesseract_tips|Tesseract tips]]
Line 32: Line 31:


*[http://daryllafferty.com/salix SALIX] - Semi-automatic Label Information eXtraction system is designed to capture herbarium specimen label data with the use of optical character recognition technologies and transfer those data into a database.
*[http://daryllafferty.com/salix SALIX] - Semi-automatic Label Information eXtraction system is designed to capture herbarium specimen label data with the use of optical character recognition technologies and transfer those data into a database.
== Coding Outcomes from the aOCR Hackathon (Feb 2013)  ==
* HandwritingDetection ([https://github.com/idigbio-aocr]): an algorithm that separates images into sets with no handwriting, little handwriting (mostly text typed or printed), lots of handwriting, based on the noise generated by the OCR software.


== Sample Images  ==
== Sample Images  ==
472

edits