OCR Resources: Difference between revisions

OCR Resources (view source)

308 bytes added , 4 December 2013

no edit summary

472

edits

@@ Line 15: / Line 15: @@
 *[http://en.wikipedia.org/wiki/Tesseract_(software) Tesseract] - Open source optical character recognition engine available under the Apache License, Version 2.0. Software is capable to functioning on various operating systems. Considered to be one of the more accurate OCR engines that are available under a free software license.
 **[http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseracticdar2007.pdf An Overview of the Tesseract OCR Engine] by Ray Smith at Google Inc.
 **[[OCR Tips#Tesseract_tips|Tesseract tips]]
@@ Line 32: / Line 31: @@
 *[http://daryllafferty.com/salix SALIX] - Semi-automatic Label Information eXtraction system is designed to capture herbarium specimen label data with the use of optical character recognition technologies and transfer those data into a database.
+== Coding Outcomes from the aOCR Hackathon (Feb 2013)   ==
+* HandwritingDetection ([https://github.com/idigbio-aocr]): an algorithm that separates images into sets with no handwriting, little handwriting (mostly text typed or printed), lots of handwriting, based on the noise generated by the OCR software.
 == Sample Images  ==