Data Quality Toolkit 2024: Difference between revisions

From iDigBio
Jump to navigation Jump to search
No edit summary
Line 51: Line 51:


=== Missing Latitudes/Longitudes ===
=== Missing Latitudes/Longitudes ===
'''Problem:''' A record has a latitude value, but not a longitude value.
'''Problem:''' A record has a latitude value, but not a longitude value, or vice versa.


'''Solutions:'''
'''Solutions:'''

Revision as of 14:31, 16 February 2024


Overview

This page was created to aggregate common data quality issues and potential solutions to those issues in collection management systems and CMS-agnostic tools. Data quality issues are grouped into data categories, and links to resources for identifying and fixing the issues are provided.

This page was inspired by Bob Mesibov's Data Cleaner's Cookbook.

If you already know which tool or CMS you are using to clean your data, you can visit a tool- and CMS-specific toolkit: Arctos, Excel, OpenRefine, Specify, Symbiota, TaxonWorks.

Catalog Numbers and Other Identifiers

Duplicate Catalog Numbers

Problem: The same catalog number is used multiple times within your dataset. (This problem may or may not be intentional, depending on your collection's policies. It is generally best to not duplicate catalog numbers, when possible).

Solutions:

Dates

Identified Date Earlier than Collected Date

Problem: The date the specimen was identified (dateIdentified field) is earlier than the date the specimen was collected (eventDate).

Solutions:

  • Arctos
  • Excel
  • OpenRefine
  • Specify
  • Symbiota
  • TaxonWorks

Geography

Improperly Negated Latitudes/Longitudes

Problem: The sign of the latitude (decimalLatitude) or longitude (decimalLongitude) does not match the sign/hemisphere of the given country. For example, all longitudes in the U.S. should be negative.

Solutions:

  • Arctos
  • Excel
  • OpenRefine
  • Specify
  • Symbiota
  • TaxonWorks

Missing Latitudes/Longitudes

Problem: A record has a latitude value, but not a longitude value, or vice versa.

Solutions:

  • Arctos
  • Excel
  • OpenRefine
  • Specify
  • Symbiota
  • TaxonWorks

Misspelled Geographic Unit Names

Problem: The geographic units (e.g., country, state, county) are misspelled, resulting in poor matching of geographic unit names to existing geographic lists.

Solutions:

  • Arctos
  • Excel
  • OpenRefine
  • Specify
  • Symbiota
  • TaxonWorks

Taxonomy

Misspelled Taxonomic Names

Problem: Scientific names are misspelled, resulting in poor matching of taxonomic names to taxonomic databases.

Solutions:

  • Arctos
  • Excel
  • OpenRefine
  • Specify
  • Symbiota
  • TaxonWorks