iDigBio Hackathon: SALIX 2

TitleiDigBio Hackathon: SALIX 2
Publication TypePresentation
Year of Publication2013
AuthorsLafferty, Daryl
KeywordsC++, collaboration, machine language, Natural Language Processing, optical character recognition, Outreach, parsing algorithms, SALIX, SALIX2
AbstractSALIX is “Semi-Automatic Label Information eXtraction” parsing system, developed and used extensively at Arizona State University. The purpose is to parse OCR'd label data into the respective data fields (e.g. Collector, collection number, etc.). The original SALIX required user intervention with each label to format and proofread. SALIX 2 tries to remove the “Semi” and make it fully automatic. Written in C++ in Windows. Development was focused on Lichen labels.
Hackathon participant and SALIX developer, Daryl Lafferty, presents his work and intitial results on optimizing SALIX algorithms for parsing OCR output from lichen labels into standard Darwin Core.