OCR Resources: Difference between revisions

Revision as of 12:15, 2 October 2012

OCR Software used by ADBC projects

ABBYY FineReader - high performing proprietary OCR software provided by the ABBYY software company. The Professional and Corporate Editions are designed specifically for Microsoft Windows operating systems.
- FineReader tips

ABBYY Recognition Server - extends the features of FineReader and places them in a server-based scalable platform.
- Recognition Server tips

GOCR (or JOCR) is a free optical character recognition program, initially written by Jörg Schulenburg. It can be used to convert or scan image files (portable pixmap or PCX) into text files.

OCRopus - free document analysis and optical character recognition (OCR) system released under the Apache License, Version 2.0 with a very modular design through the use of plugins.

Tesseract - Open source optical character recognition engine available under the Apache License, Version 2.0. Software is capable to functioning on various operating systems. Considered to be one of the more accurate OCR engines that are available under a free software license.
- An Overview of the Tesseract OCR Engine by Ray Smith at Google Inc.
- Tesseract tips

Xerox OCR engine -

List of other OCR software: http://en.wikipedia.org/wiki/List_of_optical_character_recognition_software

Biodiversity Informatics Tools Incorporating OCR Technology

Apiary Project - High-throughput workflow for computer-assisted human parsing of biological specimen label data

HerbIS - (Erudite Recorded Botanical Information Synthesizer) - Software algorithms that processes and presents herbarium label data in machine-understandable format through the use of natural language processing (NLP). Created at the Yale Peabody Museum of Natural History.

Symbiota - Specimen-based virtual flora/fauna software with a built in module for specimen digitization that incorporates OCR technology

SALIX - Semi-automatic Label Information eXtraction system is designed to capture herbarium specimen label data with the use of optical character recognition technologies and transfer those data into a database.

Sample Images

Bryophye Images from LBCC project (10,500 image URLs)
- https://www.idigbio.org/wiki/images/f/fd/BryophyteOcrImageSamples.odt
Lichen Images from LBCC project (10,500 image URLs)
- https://www.idigbio.org/wiki/images/f/f1/LichensOcrImageSamples.odt
NYBG plant herbarium sheets
BRIT plant herbarium images
Insect images?

Museum Specimen Label Examples

See sample herbarium label and content defined from the San Diego County Natural History Museum Plant Atlas Project FAQ.
Sample hebarium label from University of Colorado (COLO): Sample herbarium label

@@ Line 2: / Line 2: @@
 *[http://finereader.abbyy.com/corporate/ ABBYY FineReader] - high performing proprietary OCR software provided by the [http://www.abbyy.com ABBYY] software company. The Professional and Corporate Editions are designed specifically for Microsoft Windows operating systems.
-**[[OCR Tips| FineReader tips]]
+**[[OCR Tips#FineReader tips|FineReader tips]]
 *[http://www.abbyy.com/recognition_server/functionality/?utm_expid=34274949-7&utm_referrer=http%3A%2F%2Fwww.abbyy.com%2Frecognition_server%2Fkey_features%2F ABBYY Recognition Server] - extends the features of FineReader and places them in a server-based scalable platform.
-**[[OCR Tips| Recognition Server tips]]
+**[[OCR Tips#Recognition Server| Recognition Server tips]]
 *[http://en.wikipedia.org/wiki/GOCR GOCR] (or JOCR) is a free optical character recognition program, initially written by Jörg Schulenburg. It can be used to convert or scan image files (portable pixmap or PCX) into text files.
@@ Line 13: / Line 13: @@
 *[http://en.wikipedia.org/wiki/Tesseract_(software) Tesseract] - Open source optical character recognition engine available under the Apache License, Version 2.0. Software is capable to functioning on various operating systems. Considered to be one of the more accurate OCR engines that are available under a free software license.
 **[http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseracticdar2007.pdf An Overview of the Tesseract OCR Engine] by Ray Smith at Google Inc.
-**[[OCR Tips| Tesseract tips]]
+**[[OCR Tips#Tesseract tips| Tesseract tips]]
 *Xerox OCR engine -

OCR Resources: Difference between revisions

Revision as of 12:15, 2 October 2012

Contents

OCR Software used by ADBC projects

Biodiversity Informatics Tools Incorporating OCR Technology

Sample Images

Museum Specimen Label Examples

Navigation menu

OCR Resources: Difference between revisions

Revision as of 12:15, 2 October 2012

OCR Software used by ADBC projects

Biodiversity Informatics Tools Incorporating OCR Technology

Sample Images

Museum Specimen Label Examples

Navigation menu

Search