Title | iDigBio Hackathon: SALIX 2 |
Publication Type | Presentation |
Year of Publication | 2013 |
Authors | Lafferty, Daryl |
Keywords | C++, collaboration, machine language, Natural Language Processing, optical character recognition, Outreach, parsing algorithms, SALIX, SALIX2 |
Abstract | SALIX is “Semi-Automatic Label Information eXtraction” parsing system, developed and used extensively at Arizona State University. The purpose is to parse OCR'd label data into the respective data fields (e.g. Collector, collection number, etc.). The original SALIX required user intervention with each label to format and proofread. SALIX 2 tries to remove the “Semi” and make it fully automatic. Written in C++ in Windows. Development was focused on Lichen labels. |
URL | https://www.idigbio.org/sites/default/files/workshop-presentations/aocr-hackathon/SALIX2.ppt |
Description:
Hackathon participant and SALIX developer, Daryl Lafferty, presents his work and intitial results on optimizing SALIX algorithms for parsing OCR output from lichen labels into standard Darwin Core.