Hackathon Challenge: Difference between revisions

Jump to navigation Jump to search
Line 64: Line 64:
=== [[Dataset Errata]] ===
=== [[Dataset Errata]] ===
*known / discovered errors in the .txt, .csv files as they are found.
*known / discovered errors in the .txt, .csv files as they are found.
'''Gold Parsing Errors'''


Many of the Lichen Gold labels have verbatimLatitude and verbatimLongitude, but the Gold Parsed files do not have the calculated decimalLatitude and decimalLongitude.  This seems especially true for the New York labels. (Daryl)
Many of the Lichen Gold labels have verbatimLatitude and verbatimLongitude, but the Gold Parsed files do not have the calculated decimalLatitude and decimalLongitude.  This seems especially true for the New York labels. (Daryl)
Line 91: Line 92:


Inconsistencies in several Gold Parsed labels regarding whether to include the period at the end of a field as part of the field.  Example:  verbatimCoordinates in NY01075782_lg.csv includes the period at the end.  NY01075780_lg.csv does not include the period.
Inconsistencies in several Gold Parsed labels regarding whether to include the period at the end of a field as part of the field.  Example:  verbatimCoordinates in NY01075782_lg.csv includes the period at the end.  NY01075780_lg.csv does not include the period.
Gold Parsed NY01075761_lg.txt corrects a Gold OCR error by adding the 1 to the end of 0107576.  The field should be corrected in the Gold OCR, but until done so, the parsing should be verbatim (see below under Gold OCR Errors).
'''Gold OCR Errors'''
NY01075761_lg.txt has catalogNumber as 0107576, omitting the 1 at the end.


== Parameters ==
== Parameters ==

Navigation menu