Data Quality Toolkit 2024: Difference between revisions

From iDigBio
Jump to navigation Jump to search
mNo edit summary
 
(5 intermediate revisions by 2 users not shown)
Line 9: Line 9:
This page was inspired by Bob Mesibov's [https://www.datafix.com.au/cookbook/ Data Cleaner's Cookbook], GBIF's [https://data-blog.gbif.org/post/issues-and-flags/ data quality flags], and iDigBio's [https://github.com/iDigBio/idigbio-search-api/wiki/Data-Quality-Flags data quality flags].
This page was inspired by Bob Mesibov's [https://www.datafix.com.au/cookbook/ Data Cleaner's Cookbook], GBIF's [https://data-blog.gbif.org/post/issues-and-flags/ data quality flags], and iDigBio's [https://github.com/iDigBio/idigbio-search-api/wiki/Data-Quality-Flags data quality flags].


If you already know which tool or CMS you are using to clean your data, you can visit a tool- and CMS-specific toolkit: [[Arctos Data Quality Toolkit|Arctos]], [[Excel Data Quality Toolkit|Excel]], [[OpenRefine Data Quality Toolkit|OpenRefine]], [[Specify Data Quality Toolkit|Specify]], [[https://biokic.github.io/symbiota-docs/editor/quality/ Symbiota]], [[TaxonWorks Data Quality Toolkit|TaxonWorks]].
If you already know which tool or CMS you are using to clean your data, you can visit a tool- and CMS-specific toolkit: [[Arctos Data Quality Toolkit|Arctos]], [[Excel Data Quality Toolkit|Excel]], [[Specify Data Quality Toolkit|Specify]], [https://biokic.github.io/symbiota-docs/editor/quality/ Symbiota], [[TaxonWorks Data Quality Toolkit|TaxonWorks]]. Additional command line tools can be found in Bob Mesibov's [https://www.datafix.com.au/darwin-core-checker/ Darwin Core Checker tool].


== Catalog Numbers and Other Identifiers==
== Catalog Numbers and Other Identifiers==
Line 17: Line 17:


'''Solutions:'''
'''Solutions:'''
* [[Arctos Data Quality Toolkit#Duplicate Catalog Numbers|Arctos]]
* [https://handbook.arctosdb.org/documentation/catalog.html#catalog-number Arctos]
* [[Excel Data Quality Toolkit#Duplicate Catalog Numbers|Excel]]
* [[Excel Data Quality Toolkit#Duplicate Catalog Numbers|Excel]]
* [[OpenRefine Data Quality Toolkit#Duplicate Catalog Numbers|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Duplicate Catalog Numbers|OpenRefine]]
* [[Specify Data Quality Toolkit#Duplicate Catalog Numbers|Specify]]
* [[Specify Data Quality Toolkit#Duplicate Catalog Numbers|Specify]]
* [[Symbiota Data Quality Toolkit#Duplicate Catalog Numbers|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#duplicate-catalog-numbers Symbiota]
* [[TaxonWorks Data Quality Toolkit#Duplicate Catalog Numbers|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Duplicate Catalog Numbers|TaxonWorks]]


Line 34: Line 34:
* [[OpenRefine Data Quality Toolkit#Date Hasn't Happened Yet|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Date Hasn't Happened Yet|OpenRefine]]
* [[Specify Data Quality Toolkit#Date Hasn't Happened Yet|Specify]]
* [[Specify Data Quality Toolkit#Date Hasn't Happened Yet|Specify]]
* [[Symbiota Data Quality Toolkit#Date Hasn't Happened Yet|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#date-hasnt-happened-yet Symbiota]
* [[TaxonWorks Data Quality Toolkit#Date Hasn't Happened Yet|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Date Hasn't Happened Yet|TaxonWorks]]


Line 46: Line 46:
* [[OpenRefine Data Quality Toolkit#Date is Suspiciously Old|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Date is Suspiciously Old|OpenRefine]]
* [[Specify Data Quality Toolkit#Date is Suspiciously Old|Specify]]
* [[Specify Data Quality Toolkit#Date is Suspiciously Old|Specify]]
* [[Symbiota Data Quality Toolkit#Date is Suspiciously Old|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#date-is-suspiciously-old Symbiota]
* [[TaxonWorks Data Quality Toolkit#Date is Suspiciously Old|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Date is Suspiciously Old|TaxonWorks]]


Line 57: Line 57:
* [[OpenRefine Data Quality Toolkit#Identified Date Earlier than Collected Date|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Identified Date Earlier than Collected Date|OpenRefine]]
* [[Specify Data Quality Toolkit#Identified Date Earlier than Collected Date|Specify]]
* [[Specify Data Quality Toolkit#Identified Date Earlier than Collected Date|Specify]]
* [[Symbiota Data Quality Toolkit#Identified Date Earlier than Collected Date|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#identified-date-earlier-than-collected-date Symbiota]
* [[TaxonWorks Data Quality Toolkit#Identified Date Earlier than Collected Date|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Identified Date Earlier than Collected Date|TaxonWorks]]


Line 68: Line 68:
* [[OpenRefine Data Quality Toolkit#Year, Month, and Day Values Do Not Match Date|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Year, Month, and Day Values Do Not Match Date|OpenRefine]]
* [[Specify Data Quality Toolkit#Year, Month, and Day Values Do Not Match Date|Specify]]
* [[Specify Data Quality Toolkit#Year, Month, and Day Values Do Not Match Date|Specify]]
* [[Symbiota Data Quality Toolkit#Year, Month, and Day Values Do Not Match Date|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#year-month-and-day-values-do-not-match-date Symbiota]
* [[TaxonWorks Data Quality Toolkit#Year, Month, and Day Values Do Not Match Date|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Year, Month, and Day Values Do Not Match Date|TaxonWorks]]


Line 81: Line 81:
* [[OpenRefine Data Quality Toolkit#Coordinates are Zero|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Coordinates are Zero|OpenRefine]]
* [[Specify Data Quality Toolkit#Coordinates are Zero|Specify]]
* [[Specify Data Quality Toolkit#Coordinates are Zero|Specify]]
* [[Symbiota Data Quality Toolkit#Coordinates are Zero|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#coordinates-are-zero Symbiota]
* [[TaxonWorks Data Quality Toolkit#Coordinates are Zero|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Coordinates are Zero|TaxonWorks]]


Line 92: Line 92:
* [[OpenRefine Data Quality Toolkit#Coordinates Do Not Fall Within Named Geographic Unit|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Coordinates Do Not Fall Within Named Geographic Unit|OpenRefine]]
* [[Specify Data Quality Toolkit#Coordinates Do Not Fall Within Named Geographic Unit|Specify]]
* [[Specify Data Quality Toolkit#Coordinates Do Not Fall Within Named Geographic Unit|Specify]]
* [[Symbiota Data Quality Toolkit#Coordinates Do Not Fall Within Named Geographic Unit|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#coordinates-do-not-fall-within-named-geographic-unit Symbiota]
* [[TaxonWorks Data Quality Toolkit#Coordinates Do Not Fall Within Named Geographic Unit|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Coordinates Do Not Fall Within Named Geographic Unit|TaxonWorks]]


Line 103: Line 103:
* [[OpenRefine Data Quality Toolkit#Georeference Metadata with no Associated Georeference|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Georeference Metadata with no Associated Georeference|OpenRefine]]
* [[Specify Data Quality Toolkit#Georeference Metadata with no Associated Georeference|Specify]]
* [[Specify Data Quality Toolkit#Georeference Metadata with no Associated Georeference|Specify]]
* [[Symbiota Data Quality Toolkit#Georeference Metadata with no Associated Georeference|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#georeference-metadata-with-no-associated-georeference Symbiota]
* [[TaxonWorks Data Quality Toolkit#Georeference Metadata with no Associated Georeference|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Georeference Metadata with no Associated Georeference|TaxonWorks]]


Line 114: Line 114:
* [[OpenRefine Data Quality Toolkit#Elevation is Unlikely|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Elevation is Unlikely|OpenRefine]]
* [[Specify Data Quality Toolkit#Elevation is Unlikely|Specify]]
* [[Specify Data Quality Toolkit#Elevation is Unlikely|Specify]]
* [[Symbiota Data Quality Toolkit#Elevation is Unlikely|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#elevation-is-unlikely Symbiota]
* [[TaxonWorks Data Quality Toolkit#Elevation is Unlikely|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Elevation is Unlikely|TaxonWorks]]


Line 125: Line 125:
* [[OpenRefine Data Quality Toolkit#Improperly Negated Latitudes/Longitudes|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Improperly Negated Latitudes/Longitudes|OpenRefine]]
* [[Specify Data Quality Toolkit#Improperly Negated Latitudes/Longitudes|Specify]]
* [[Specify Data Quality Toolkit#Improperly Negated Latitudes/Longitudes|Specify]]
* [[Symbiota Data Quality Toolkit#Improperly Negated Latitudes/Longitudes|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#improperly-negated-latitudeslongitudes Symbiota]
* [[TaxonWorks Data Quality Toolkit#Improperly Negated Latitudes/Longitudes|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Improperly Negated Latitudes/Longitudes|TaxonWorks]]


=== Invalid Coordinates ===
=== Invalid Coordinates ===
'''Problem:''' Coordinates deviate from accepted ranges or formats, like decimalLatitude and decimalLongitude exceeding -90 to 90 and -180 to 180, respectively. verbatimCoordinates have to be valid values for coordinates in decimal degrees, degrees decimal minutes, degrees minutes second.  
'''Problem:''' Coordinates deviate from accepted ranges or formats, like decimalLatitude and decimalLongitude exceeding -90 to 90 and -180 to 180, respectively. verbatimCoordinates have to be valid values for coordinates in decimal degrees, degrees decimal minutes, degrees minutes second.  


'''Solutions:'''
'''Solutions:'''
Line 137: Line 136:
* [[OpenRefine Data Quality Toolkit#Invalid Coordinates|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Invalid Coordinates|OpenRefine]]
* [[Specify Data Quality Toolkit#Invalid Coordinates|Specify]]
* [[Specify Data Quality Toolkit#Invalid Coordinates|Specify]]
* [[Symbiota Data Quality Toolkit#Invalid Coordinates|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#invalid-coordinates Symbiota]
* [[TaxonWorks Data Quality Toolkit#Invalid Coordinates|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Invalid Coordinates|TaxonWorks]]


Line 148: Line 147:
* [[OpenRefine Data Quality Toolkit#Lower Geography Values are Provided, but No Higher Geography|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Lower Geography Values are Provided, but No Higher Geography|OpenRefine]]
* [[Specify Data Quality Toolkit#Lower Geography Values are Provided, but No Higher Geography|Specify]]
* [[Specify Data Quality Toolkit#Lower Geography Values are Provided, but No Higher Geography|Specify]]
* [[Symbiota Data Quality Toolkit#Lower Geography Values are Provided, but No Higher Geography|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#lower-geography-values-are-provided-but-no-higher-geography Symbiota]
* [[TaxonWorks Data Quality Toolkit#Lower Geography Values are Provided, but No Higher Geography|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Lower Geography Values are Provided, but No Higher Geography|TaxonWorks]]


Line 159: Line 158:
* [[OpenRefine Data Quality Toolkit#Minimum and Maximum Elevation Values Mismatched|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Minimum and Maximum Elevation Values Mismatched|OpenRefine]]
* [[Specify Data Quality Toolkit#Minimum and Maximum Elevation Values Mismatched|Specify]]
* [[Specify Data Quality Toolkit#Minimum and Maximum Elevation Values Mismatched|Specify]]
* [[Symbiota Data Quality Toolkit#Minimum and Maximum Elevation Values Mismatched|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#minimum-and-maximum-elevation-values-mismatched Symbiota]
* [[TaxonWorks Data Quality Toolkit#Minimum and Maximum Elevation Values Mismatched|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Minimum and Maximum Elevation Values Mismatched|TaxonWorks]]


Line 170: Line 169:
* [[OpenRefine Data Quality Toolkit#Mismatched Country and CountryCode Values|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Mismatched Country and CountryCode Values|OpenRefine]]
* [[Specify Data Quality Toolkit#Mismatched Country and CountryCode Values|Specify]]
* [[Specify Data Quality Toolkit#Mismatched Country and CountryCode Values|Specify]]
* [[Symbiota Data Quality Toolkit#Mismatched Country and CountryCode Values|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#mismatched-country-and-countrycode-values Symbiota]
* [[TaxonWorks Data Quality Toolkit#Mismatched Country and CountryCode Values|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Mismatched Country and CountryCode Values|TaxonWorks]]


Line 181: Line 180:
* [[OpenRefine Data Quality Toolkit#Mismatched Geographic Terms|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Mismatched Geographic Terms|OpenRefine]]
* [[Specify Data Quality Toolkit#Mismatched Geographic Terms|Specify]]
* [[Specify Data Quality Toolkit#Mismatched Geographic Terms|Specify]]
* [[Symbiota Data Quality Toolkit#Mismatched Geographic Terms|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#mismatched-geographic-terms Symbiota]
* [[TaxonWorks Data Quality Toolkit#Mismatched Geographic Terms|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Mismatched Geographic Terms|TaxonWorks]]


Line 192: Line 191:
* [[OpenRefine Data Quality Toolkit#Missing Geodetic Datum|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Missing Geodetic Datum|OpenRefine]]
* [[Specify Data Quality Toolkit#Missing Geodetic Datum|Specify]]
* [[Specify Data Quality Toolkit#Missing Geodetic Datum|Specify]]
* [[Symbiota Data Quality Toolkit#Missing Geodetic Datum|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#missing-geodetic-datum Symbiota]
* [[TaxonWorks Data Quality Toolkit#Missing Geodetic Datum|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Missing Geodetic Datum|TaxonWorks]]


Line 203: Line 202:
* [[OpenRefine Data Quality Toolkit#Missing Latitudes/Longitudes|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Missing Latitudes/Longitudes|OpenRefine]]
* [[Specify Data Quality Toolkit#Missing Latitudes/Longitudes|Specify]]
* [[Specify Data Quality Toolkit#Missing Latitudes/Longitudes|Specify]]
* [[Symbiota Data Quality Toolkit#Missing Latitudes/Longitudes|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#missing-latitudeslongitudes Symbiota]
* [[TaxonWorks Data Quality Toolkit#Missing Latitudes/Longitudes|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Missing Latitudes/Longitudes|TaxonWorks]]


Line 214: Line 213:
* [[OpenRefine Data Quality Toolkit#Misspelled Geographic Unit Names|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Misspelled Geographic Unit Names|OpenRefine]]
* [[Specify Data Quality Toolkit#Misspelled Geographic Unit Names|Specify]]
* [[Specify Data Quality Toolkit#Misspelled Geographic Unit Names|Specify]]
* [[Symbiota Data Quality Toolkit#Misspelled Geographic Unit Names|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#misspelled-geographic-unit-names Symbiota]
* [[TaxonWorks Data Quality Toolkit#Misspelled Geographic Unit Names|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Misspelled Geographic Unit Names|TaxonWorks]]


Line 227: Line 226:
* [[OpenRefine Data Quality Toolkit#Misspelled or Invalid Taxonomic Names|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Misspelled or Invalid Taxonomic Names|OpenRefine]]
* [[Specify Data Quality Toolkit#Misspelled or Invalid Taxonomic Names|Specify]]
* [[Specify Data Quality Toolkit#Misspelled or Invalid Taxonomic Names|Specify]]
* [[Symbiota Data Quality Toolkit#Misspelled or Invalid Taxonomic Names|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#misspelled-or-invalid-taxonomic-names Symbiota]
* [[TaxonWorks Data Quality Toolkit#Misspelled or Invalid Taxonomic Names|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Misspelled or Invalid Taxonomic Names|TaxonWorks]]


Line 238: Line 237:
* [[OpenRefine Data Quality Toolkit#Unknown Higher Taxonomy|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Unknown Higher Taxonomy|OpenRefine]]
* [[Specify Data Quality Toolkit#Unknown Higher Taxonomy|Specify]]
* [[Specify Data Quality Toolkit#Unknown Higher Taxonomy|Specify]]
* [[Symbiota Data Quality Toolkit#Unknown Higher Taxonomy|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#unknown-higher-taxonomy Symbiota]
* [[TaxonWorks Data Quality Toolkit#Unknown Higher Taxonomy|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Unknown Higher Taxonomy|TaxonWorks]]


Line 251: Line 250:
* [[OpenRefine Data Quality Toolkit#Incorrect Character Encodings|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Incorrect Character Encodings|OpenRefine]]
* [[Specify Data Quality Toolkit#Incorrect Character Encodings|Specify]]
* [[Specify Data Quality Toolkit#Incorrect Character Encodings|Specify]]
* [[Symbiota Data Quality Toolkit#Incorrect Character Encodings|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#incorrect-character-encodings Symbiota]
* [[TaxonWorks Data Quality Toolkit#Incorrect Character Encodings|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Incorrect Character Encodings|TaxonWorks]]


Line 262: Line 261:
* [[OpenRefine Data Quality Toolkit#Incorrect Line Endings|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Incorrect Line Endings|OpenRefine]]
* [[Specify Data Quality Toolkit#Incorrect Line Endings|Specify]]
* [[Specify Data Quality Toolkit#Incorrect Line Endings|Specify]]
* [[Symbiota Data Quality Toolkit#Incorrect Line Endings|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#incorrect-line-endings Symbiota]
* [[TaxonWorks Data Quality Toolkit#Incorrect Line Endings|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Incorrect Line Endings|TaxonWorks]]


Line 273: Line 272:
* [[OpenRefine Data Quality Toolkit#Invalid Individual Count|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Invalid Individual Count|OpenRefine]]
* [[Specify Data Quality Toolkit#Invalid Individual Count|Specify]]
* [[Specify Data Quality Toolkit#Invalid Individual Count|Specify]]
* [[Symbiota Data Quality Toolkit#Invalid Individual Count|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#invalid-individual-count Symbiota]
* [[TaxonWorks Data Quality Toolkit#Invalid Individual Count|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Invalid Individual Count|TaxonWorks]]


Line 288: Line 287:
* [[OpenRefine Data Quality Toolkit#Non-standardized BasisOfRecord Values|OpenRefine]]
* [[OpenRefine Data Quality Toolkit#Non-standardized BasisOfRecord Values|OpenRefine]]
* [[Specify Data Quality Toolkit#Non-standardized BasisOfRecord Values|Specify]]
* [[Specify Data Quality Toolkit#Non-standardized BasisOfRecord Values|Specify]]
* [[Symbiota Data Quality Toolkit#Non-standardized BasisOfRecord Values|Symbiota]]
* [https://biokic.github.io/symbiota-docs/editor/quality/#non-standardized-basisofrecord-values Symbiota]
* [[TaxonWorks Data Quality Toolkit#Non-standardized BasisOfRecord Values|TaxonWorks]]
* [[TaxonWorks Data Quality Toolkit#Non-standardized BasisOfRecord Values|TaxonWorks]]

Latest revision as of 17:19, 10 April 2024


Overview

This page was created to aggregate common data quality issues and potential solutions to those issues in collection management systems and CMS-agnostic tools. Data quality issues are grouped into data categories, and links to resources for identifying and fixing the issues are provided.

This page was inspired by Bob Mesibov's Data Cleaner's Cookbook, GBIF's data quality flags, and iDigBio's data quality flags.

If you already know which tool or CMS you are using to clean your data, you can visit a tool- and CMS-specific toolkit: Arctos, Excel, Specify, Symbiota, TaxonWorks. Additional command line tools can be found in Bob Mesibov's Darwin Core Checker tool.

Catalog Numbers and Other Identifiers

Duplicate Catalog Numbers

Problem: The same catalog number is used multiple times within your dataset. (This problem may or may not be intentional, depending on your collection's policies. It is generally best to not duplicate catalog numbers, when possible).

Solutions:

Dates

Date Hasn't Happened Yet

Problem: The date the specimen was identified, collected (often designated using the eventDate field), or georeferenced is in the future.

Solutions:

Date is Suspiciously Old

Problem: The date the specimen was identified, collected (often designated using the eventDate field), or georeferenced is outside the expected historical date range. The expected date range depends on the institution, but it is unlikely that most collections have specimens with dates prior to 1600.

Solutions:

Identified Date Earlier than Collected Date

Problem: The date the specimen was identified (dateIdentified field) is earlier than the date the specimen was collected (eventDate).

Solutions:

Year, Month, and Day Values Do Not Match Date

Problem: The event year, month, and day values do not match the provided event date. The event date is often the date of collection for preserved specimens.

Solutions:

Geography

Coordinates are Zero

Problem: The provided latitude and longitude values are 0.

Solutions:

Coordinates Do Not Fall Within Named Geographic Unit

Problem: The provided coordinates do not fall within the geographic boundaries of the named country, state, and/or county.

Solutions:

Georeference Metadata with no Associated Georeference

Problem: Metadata fields regarding coordinates, such as coordinateUncertaintyInMeters, georeferenceProtocol, georeferenceSources, georeferencedBy, georeferenceRemarks, and geodeticDatum are provided, but no coordinates are present. This is sometimes intentional, particularly when georeferencedBy and georeferencedRemarks are used to indicate whether a record was purposefully not georeferenced. However, it is rare that the other metadata fields can be used without associated coordinates (i.e., decimalLatitude, [ https://dwc.tdwg.org/terms/#dwc:decimalLongitude decimalLongitude], or verbatimCoordinates).

Solutions:

Elevation is Unlikely

Problem: Elevation values are either too high (>17000 m) or too low (-11000 m) to occur on Earth.

Solutions:

Improperly Negated Latitudes/Longitudes

Problem: The sign of the latitude (decimalLatitude) or longitude (decimalLongitude) does not match the sign/hemisphere of the given country. For example, all longitudes in the U.S. should be negative.

Solutions:

Invalid Coordinates

Problem: Coordinates deviate from accepted ranges or formats, like decimalLatitude and decimalLongitude exceeding -90 to 90 and -180 to 180, respectively. verbatimCoordinates have to be valid values for coordinates in decimal degrees, degrees decimal minutes, degrees minutes second.

Solutions:

Lower Geography Values are Provided, but No Higher Geography

Problem: Lower geography (e.g., county, state/province) values exist, but no higher geography values (e.g., country) are provided.

Solutions:

Minimum and Maximum Elevation Values Mismatched

Problem: The minimum elevation (minimumElevationInMeters) has a greater value than the maximum elevation (maximumElevationInMeters).

Solutions:

Mismatched Country and CountryCode Values

Problem: The provided value for country and countryCode do not match.

Solutions:

Mismatched Geographic Terms

Problem: A record has lower geographic terms (e.g., state/province, county) that do not exist under the provided higher geographic term(s). For example, country = Canada and stateProvince = Sussex. There is no Sussex province in Canada.

Solutions:

Missing Geodetic Datum

Problem: Geodetic datum is a key piece of a properly georeferenced specimen, but is usually left blank. Although it is commonly assumed to be in ‘WGS84’, this should be added and noted as such.

Solutions:

Missing Latitudes/Longitudes

Problem: A record has a latitude value, but not a longitude value, or vice versa.

Solutions:

Misspelled Geographic Unit Names

Problem: The geographic units (e.g., country, state/province, county) are misspelled, resulting in poor matching of geographic unit names to existing geographic lists.

Solutions:

Taxonomy

Misspelled or Invalid Taxonomic Names

Problem: Scientific names are misspelled, resulting in poor matching of taxonomic names to taxonomic databases.

Solutions:

Unknown Higher Taxonomy

Problem: Species may be missing higher taxonomic information.

Solutions:

Other Issues

Incorrect Character Encodings

Problem: Data inconsistencies arise when incorrect character encodings are used during data manipulation or transfer. This issue occurs when datasets are opened, downloaded, or imported across different software platforms, leading to misinterpretation and garbled text. For instance, special characters like accents or symbols may be rendered incorrectly, affecting the readability and accuracy of the data. (e.g., Carl Linné).

Solutions:

Incorrect Line Endings

Problem: When transferring text files between Unix/Linux and DOS/Windows systems, line endings can become inconsistent. Unix/Linux systems typically use line feed (LF) characters, while DOS/Windows systems use carriage return (CR) and line feed (LF) combinations. This mismatch can result in extra characters appearing in the data, causing visual artifacts and processing errors.

Solutions:

Invalid Individual Count

Problem: individualCount values may not make sense as a positive integer.

Solutions:

Non-standardized BasisOfRecord Values

Problem: Values in the BasisOfRecord field do not match the recommended controlled vocabulary. While using standardized terms in this field is not strictly necessary, doing so does improve the discoverability and interoperability of your data.

The currently accepted values for BasisOfRecord include: MaterialEntity, PreservedSpecimen, FossilSpecimen, LivingSpecimen, MaterialSample, Event, HumanObservation, MachineObservation, Taxon, Occurrence, MaterialCitation.

Note that even punctuation and capitalization differences in these values (e.g., Preserved Specimen) are discouraged.

Solutions: