Dataset Errata: Difference between revisions

Line 62: Line 62:
== Right single Quote ==
== Right single Quote ==
The following files contain the unicode character u+201D, Right Double Quotation Mark
The following files contain the unicode character u+201D, Right Double Quotation Mark
datasets/lichens/gold/ocr/WIS-L-0012053_lg.txt
* datasets/lichens/gold/ocr/WIS-L-0012053_lg.txt


== Parse file errors ==
::Inconsistency in Gold Parsed decimalLatitude and decimalLongitude in many labels.  All omitted from NYBG lichens and Tennesee lichens.  Gold Parsed WIS-L-0011728_lg.csv has decimalLatitude & decimalLongitude rounded to 3 decimal digits (e.g. 60.467).  WIS-L-0011729_lg.csv has decimalLatitude rounded to 2 decimal digits (60.15), decimalLongitude rounded to 1 decimal digit (-152.6).  Typical of variations found throughout the files.  It's possible that trailing zeros were just stripped off, but this inconsistency makes it impossible to match all the labels with a parsing program.
::Inconsistency in Gold Parsed decimalLatitude and decimalLongitude in many labels.  All omitted from NYBG lichens and Tennesee lichens.  Gold Parsed WIS-L-0011728_lg.csv has decimalLatitude & decimalLongitude rounded to 3 decimal digits (e.g. 60.467).  WIS-L-0011729_lg.csv has decimalLatitude rounded to 2 decimal digits (60.15), decimalLongitude rounded to 1 decimal digit (-152.6).  Typical of variations found throughout the files.  It's possible that trailing zeros were just stripped off, but this inconsistency makes it impossible to match all the labels with a parsing program.


150

edits