4,713
edits
Line 32: | Line 32: | ||
('''Deb''': if there are many '''country''' errors on these, where the error is one of putting country in by deducing it because the state is in the USA, ...removing the value will take some time to fix as every single record will need to be verified. --[[User:Dpaul|Dpaul]] 16:40, 12 June 2013 (EDT))<br> | ('''Deb''': if there are many '''country''' errors on these, where the error is one of putting country in by deducing it because the state is in the USA, ...removing the value will take some time to fix as every single record will need to be verified. --[[User:Dpaul|Dpaul]] 16:40, 12 June 2013 (EDT))<br> | ||
-- Gold Parsed TENN-L-0000001_lg.csv lists country as "USA", but on the .txt label, it is "U.S.A." (with periods). Same with Gold Parsed TENN-L-0000035_lg.csv and others.(Daryl) (Bryan: Agreed. Should be fixed to match the label.) (Ed: fixed, note that TENN-L-0000035_lg.txt has "U. S. A. " with spaces, thus conserved format) | -- Gold Parsed TENN-L-0000001_lg.csv lists country as "USA", but on the .txt label, it is "U.S.A." (with periods). Same with Gold Parsed TENN-L-0000035_lg.csv and others.'''(Daryl''') | ||
<br> | |||
('''Bryan''': Agreed. Should be fixed to match the label.) | |||
<br> | |||
('''Ed''': fixed, note that TENN-L-0000035_lg.txt has "U. S. A. " with spaces, thus conserved format) | |||
-- Gold Parsed TENN-L-0000005_lg.csv leaves country blank, but the label shows it as "USA". Again, maybe this is OK, but it should be consistent. (Daryl) (Bryan: Agreed. Should be fixed to match the OCR label.) (Ed: Fixed, country had county value) | -- Gold Parsed TENN-L-0000005_lg.csv leaves country blank, but the label shows it as "USA". Again, maybe this is OK, but it should be consistent. ('''Daryl)''' | ||
<br> | |||
('''Bryan''': Agreed. Should be fixed to match the OCR label.) | |||
<br> | |||
('''Ed''': Fixed, country had county value) | |||
<br> | <br> | ||
<br> | <br> | ||
<br> Inconsistency and errors in TENN Lichen Gold Parsed dateIdentified. Examples: | <br> | ||
Inconsistency and errors in TENN Lichen Gold Parsed dateIdentified. Examples: | |||
-- TENN-L-0000015_lg.csv has dateIdentified in the wrong format, neither verbatim, nor standard DarwinCore format: Verbatim would be: Nov. 12, 1939, DarwinCore would be: 1939-11-12, Listed is: 1939-November-12. (Bryan: I think excel may have imposed it's own format and changed the records. If the column type is set to "text" Excel will not transform to a new format. ) (Ed: The verbatimEventDate (not dateIdentified is in the correct format, but if you open file in excel it will convert the display to match program's defaults) | -- TENN-L-0000015_lg.csv has dateIdentified in the wrong format, neither verbatim, nor standard DarwinCore format: Verbatim would be: Nov. 12, 1939, DarwinCore would be: 1939-11-12, Listed is: 1939-November-12. | ||
<br> | |||
'''(Bryan''': I think excel may have imposed it's own format and changed the records. If the column type is set to "text" Excel will not transform to a new format. ) | |||
<br> | |||
('''Ed''': The verbatimEventDate (not dateIdentified is in the correct format, but if you open file in excel it will convert the display to match program's defaults) | |||
-- TENN-L-0000017_lg.csv omits dateIdentified, though it is on the label as 3 Feb. 1963 (Bryan: Agreed. Should be fixed to match the label.) (Ed: That is the collection date, not the dateIdentified. Format is correct in verbatimDate) | -- TENN-L-0000017_lg.csv omits dateIdentified, though it is on the label as 3 Feb. 1963 | ||
<br> | |||
('''Bryan''': Agreed. Should be fixed to match the label.) | |||
<br> | |||
('''Ed:''' That is the collection date, not the dateIdentified. Format is correct in verbatimDate) | |||
-- TENN-L-0000019_lg.csv has 1954-Aug-8, but on the label it is "8 Aug 1954", again neither verbatim nor DarwinCore (1954-08-08). (Daryl) (Bryan: Agreed. Should be fixed to match the label.) | -- TENN-L-0000019_lg.csv has 1954-Aug-8, but on the label it is "8 Aug 1954", again neither verbatim nor DarwinCore (1954-08-08). ('''Daryl)''' | ||
<br> | |||
('''Bryan''': Agreed. Should be fixed to match the label.) | |||
Gold Parsed NY01075760_lg.csv replaces the comma with a space, and replaces an apostrophe (') with a double quote (") in verbatimCoordinates: 38°42'20"N, 83°08'25'W is rendered as 38°42'20""N 83°08'25""W. (Note also that the double quote is replaced with two double quotes. This may be necessary to preserve the quote-delimited, comma separated fields, but could cause some problems when uploading to a database. Not presented here as an error, but we should be aware of possible implications.) (Bryan: Agreed. Should be fixed to match the Label except, the double quoted double quote I think is needed for CSV readers to identify fileds. I am not sure) | Gold Parsed NY01075760_lg.csv replaces the comma with a space, and replaces an apostrophe (') with a double quote (") in verbatimCoordinates: 38°42'20"N, 83°08'25'W is rendered as 38°42'20""N 83°08'25""W. (Note also that the double quote is replaced with two double quotes. This may be necessary to preserve the quote-delimited, comma separated fields, but could cause some problems when uploading to a database. Not presented here as an error, but we should be aware of possible implications.) | ||
<br> | |||
('''Bryan''': Agreed. Should be fixed to match the Label except, the double quoted double quote I think is needed for CSV readers to identify fileds. I am not sure) | |||
Gold Parsed NY01075764_lg.csv has a similar problem where a single space is replaced with a double space in verbatimCoordinates. (Bryan: Agreed. Should be fixed to match the label as best as possible. If it is not clear follow the OCR file.) | Gold Parsed NY01075764_lg.csv has a similar problem where a single space is replaced with a double space in verbatimCoordinates. | ||
<br> | |||
('''Bryan:''' Agreed. Should be fixed to match the label as best as possible. If it is not clear follow the OCR file.) | |||
Inconsistencies in several Gold Parsed labels regarding whether to include the period at the end of a field as part of the field. Example: verbatimCoordinates in NY01075782_lg.csv includes the period at the end. NY01075780_lg.csv does not include the period. (Bryan: It could go either way but I think for consistancy throughout we should keep the period at the end of anything. Whenever there are sentences with a period we keep them as in "One mile east of Dodge City." we would not think of removing the period. In gold we should treat it as verbatim. If we do platinum it could be removed.) | Inconsistencies in several Gold Parsed labels regarding whether to include the period at the end of a field as part of the field. Example: verbatimCoordinates in NY01075782_lg.csv includes the period at the end. NY01075780_lg.csv does not include the period. (Bryan: It could go either way but I think for consistancy throughout we should keep the period at the end of anything. Whenever there are sentences with a period we keep them as in "One mile east of Dodge City." we would not think of removing the period. In gold we should treat it as verbatim. If we do platinum it could be removed.) |