OCR Tips: Difference between revisions

Jump to navigation Jump to search
m
Line 40: Line 40:
Tesseract makes characteristic errors. Some of these such as "\/\/" or "\X/" substituted for for "W" can be be globally replaced as it is highly unlikely that they would occur on their own on a label. Others such as "O" substituted for "0", "1" or "!" substituted for "l" or "Z" substituted for "2" or visa versa can be replaced in a context-dependent manner in dates, latitudes and longitudes, etc. For instance, a string containing multiple errors such as "0ct. !Z, ZOlZ" can be programmatically located with a regular expression and changed to "Oct. 12, 2012" or even "12-October-2012" so that it can be entered into a database.  
Tesseract makes characteristic errors. Some of these such as "\/\/" or "\X/" substituted for for "W" can be be globally replaced as it is highly unlikely that they would occur on their own on a label. Others such as "O" substituted for "0", "1" or "!" substituted for "l" or "Z" substituted for "2" or visa versa can be replaced in a context-dependent manner in dates, latitudes and longitudes, etc. For instance, a string containing multiple errors such as "0ct. !Z, ZOlZ" can be programmatically located with a regular expression and changed to "Oct. 12, 2012" or even "12-October-2012" so that it can be entered into a database.  


==== Misc notes: ====
==== Misc notes: ====


Will often recognize vertical text<br> Image input can be tif, jpeg, or gif
Will often recognize vertical text<br> Image input can be tif, jpeg, or gif
<br/>
 
 
 
 
 
<br>


= <u>'''Omnipage features'''</u>  =
= <u>'''Omnipage features'''</u>  =
4,707

edits

Navigation menu