Title | iConference2013: Linking Data -- Biodiversity Heritage Library -- supporting knowledge discovery from digitized content |
Publication Type | Presentation |
Year of Publication | 2013 |
Authors | Mignault, John |
Keywords | Augmenting OCR, BHL, Biodiversity Heritage Library, Data Integration, Data Mining, Semantic Web, Text Extraction, workflow |
Abstract | The Biodiversity Heritage Library, a global consortium of natural history and botanical libraries, is an ongoing project digitizing the legacy literature in their collections for open access. In its partnership with the Internet Archive and through their portal BHL has made 40 million pages available for open access by the global research community. The rapid growth of the text corpus has led to challenges in identifying and extracting semantic information from it, many of them similar to the challenges faced in OCR workflow and extraction from specimen labels. We will discuss the possible improvements in knowledge extraction that could result from improvements in OCR workflow and accuracy, as well as the implications for more intelligent and integrated data integration for biodiversity informatics. |
URL | https://www.idigbio.org/sites/default/files/workshop-presentations/aocr-hackathon/AOCRandBHL.ppt |