Dataset Errata: Difference between revisions

m
Line 90: Line 90:
Gold Parsed NY01075791_lg.csv converts the "u" in "Mull" to an umlaut yielding "Müll". This actually reflects the original label, but not the Gold OCR NY01075791_lg.txt file, which has "Mull". Same for NY01075792_lg.csv, and several other in the series. (Bryan: The OCR messed up. Gold should fix OCR errors so the umlaut shoudl saty.)  
Gold Parsed NY01075791_lg.csv converts the "u" in "Mull" to an umlaut yielding "Müll". This actually reflects the original label, but not the Gold OCR NY01075791_lg.txt file, which has "Mull". Same for NY01075792_lg.csv, and several other in the series. (Bryan: The OCR messed up. Gold should fix OCR errors so the umlaut shoudl saty.)  


'''Gold Parsed CSV Files''' There are more errors in gold csv files. (Qianjin) '''(Bryan: I agree with Qianjin's edits except as noted below)'''  
----
 
'''Gold Parsed CSV Files''' There are more errors in gold csv files. (Qianjin) <br>
'''(Bryan: I agree with Qianjin's edits except as noted below)'''  
 
NY01075759_lg verbatimEventDate (1998-04-19), it should be 19 April 1998
:::From Deb: path  /home/aocr/webroot/datasets/lichens/gold/parsed the csv file, opened in Notepad++ has 19 April 1998, encoding is ANSI --[[User:Dpaul|Dpaul]] 21:20, 30 June 2013 (EDT)
:::here's what it looks like when opened in Notepad++
<pre>dwc:recordedBy, dwc:recordNumber, dwc:verbatimCoordinates, dwc:verbatimEventDate, dwc:eventDate, dwc:municipality, dwc:county, dwc:stateProvince, dwc:country, aocr:verbatimScientificName, dwc:verbatimLocality, dwc:habitat, dwc:substrate, dwc:verbatimElevation, dwc:identifiedBy, dwc:dateIdentified, dwc:verbatimLatitude, dwc:verbatimLongitude, dwc:catalogNumber, aocr:verbatimInstitution, dwc:datasetName, dwc:scientificName, dwc:decimalLatitude, dwc:decimalLongitude, dwc:fieldNotes, dwc:sex
Richard C. Harris,42164,"41°11'N, 74°08'W",19 April 1998,1998-04-19,,Rockland,NEW YORK,U.S.A.,,"Harriman State Park, along Woodtown Road West near dam at S end of Lake Sebago along Seven Lakes Drive",mixed hardwood-hemlock forest with granitic erratics.,on Trapelia placodioides Coppins & P. James,ca. 240 m,,,41°11'N,74°08'W,01075759,New York Botanical Garden,Lichens of New York State,Polycoccum minutulum Kocourkova & F. Berger,,,,</pre>
 
:::here's what it looks like when I open it at the command line with text editor (vi)
<pre>dwc:recordedBy, dwc:recordNumber, dwc:verbatimCoordinates, dwc:verbatimEventDate, dwc:eventDate, dwc:municipality, dwc:county, dwc:stateProvince, dwc:country, aocr:verbatimScientificName, dwc:verbatimLocality, dwc:habitat, dwc:substrate, dwc:verbatimElevation, dwc:identifiedBy, dwc:dateIdentified, dwc:verbatimLatitude, dwc:verbatimLongitude, dwc:catalogNumber, aocr:verbatimInstitution, dwc:datasetName, dwc:scientificName, dwc:decimalLatitude, dwc:decimalLongitude, dwc:fieldNotes, dwc:sex
Richard C. Harris,42164,"41°11'N, 74°08'W",1998-04-19,4/19/1998,,Rockland,NEW RK,U.S.A.,,"Harriman State Park, along Woodtown Road West near dam at S end of Lake Sebago along Seven Lakes Drive",mixed hardwood-hemlock forest with granitic erratics.,on Trapelia placodioides Coppins & P. James,ca. 240 m,,,41°11'N,74°0W,01075759,New York Botanical Garden,Lichens of New York State,Polycoccum minutulum Kocourkova & F. Berger,,,,</pre>


NY01075759_lg verbatimEventDate (1998-04-19), it should be 19 April 1998


NY01075760_lg no datesetName  
NY01075760_lg no datesetName  
4,713

edits