4,707
edits
m (→Misc notes:) |
m (→Misc notes:) |
||
Line 40: | Line 40: | ||
Tesseract makes characteristic errors. Some of these such as "\/\/" or "\X/" substituted for for "W" can be be globally replaced as it is highly unlikely that they would occur on their own on a label. Others such as "O" substituted for "0", "1" or "!" substituted for "l" or "Z" substituted for "2" or visa versa can be replaced in a context-dependent manner in dates, latitudes and longitudes, etc. For instance, a string containing multiple errors such as "0ct. !Z, ZOlZ" can be programmatically located with a regular expression and changed to "Oct. 12, 2012" or even "12-October-2012" so that it can be entered into a database. | Tesseract makes characteristic errors. Some of these such as "\/\/" or "\X/" substituted for for "W" can be be globally replaced as it is highly unlikely that they would occur on their own on a label. Others such as "O" substituted for "0", "1" or "!" substituted for "l" or "Z" substituted for "2" or visa versa can be replaced in a context-dependent manner in dates, latitudes and longitudes, etc. For instance, a string containing multiple errors such as "0ct. !Z, ZOlZ" can be programmatically located with a regular expression and changed to "Oct. 12, 2012" or even "12-October-2012" so that it can be entered into a database. | ||
==== Misc notes: ==== | ==== Misc notes: ==== | ||
Will often recognize vertical text<br> Image input can be tif, jpeg, or gif | Will often recognize vertical text<br> Image input can be tif, jpeg, or gif | ||
<br | |||
<br> | |||
= <u>'''Omnipage features'''</u> = | = <u>'''Omnipage features'''</u> = |