Data Quality Toolkit 2024: Difference between revisions

From iDigBio
Jump to navigation Jump to search
No edit summary
Line 15: Line 15:


'''How to FIND this Problem in Your Dataset:'''
'''How to FIND this Problem in Your Dataset:'''
* '''Arctos:'''
* Arctos
* '''Excel:'''
* Excel
* '''OpenRefine'''
* OpenRefine
* '''Specify:'''
* Specify
* '''[[Data Quality Toolkit 2024 Symbiota#Duplicate Catalog Numbers|Symbiota]]'''
* [[Data Quality Toolkit 2024 Symbiota#Duplicate Catalog Numbers|Symbiota]]
* '''TaxonWorks:'''
* TaxonWorks


'''How to FIX this Problem in your Dataset:'''
How to FIX this Problem in your Dataset:
* '''Arctos:'''
* Arctos
* '''Excel:'''
* Excel
* '''OpenRefine'''
* OpenRefine
* '''Specify:'''
* Specify
* '''Symbiota:'''
* Symbiota
* '''TaxonWorks:'''
* TaxonWorks


== Dates ==
== Dates ==
Line 36: Line 36:


'''How to FIND this Problem in Your Dataset:'''
'''How to FIND this Problem in Your Dataset:'''
* '''Arctos:'''
* Arctos
* '''Excel:'''
* Excel
* '''OpenRefine'''
* OpenRefine
* '''Specify:'''
* Specify
* '''Symbiota:'''
* Symbiota
* '''TaxonWorks:'''
* TaxonWorks


'''How to FIX this Problem in your Dataset:'''
'''How to FIX this Problem in your Dataset:'''
* '''Arctos:'''
* Arctos
* '''Excel:'''
* Excel
* '''OpenRefine'''
* OpenRefine
* '''Specify:'''
* Specify
* '''Symbiota:'''
* Symbiota
* '''TaxonWorks:'''
* TaxonWorks


== Geography ==
== Geography ==
Line 57: Line 57:


'''How to FIND this Problem in Your Dataset:'''
'''How to FIND this Problem in Your Dataset:'''
* '''Arctos:'''
* Arctos
* '''Excel:'''
* Excel
* '''OpenRefine'''
* OpenRefine
* '''Specify:'''
* Specify
* '''Symbiota:'''
* Symbiota
* '''TaxonWorks:'''
* TaxonWorks


'''How to FIX this Problem in your Dataset:'''
'''How to FIX this Problem in your Dataset:'''
* '''Arctos:'''
* Arctos
* '''Excel:'''
* Excel
* '''OpenRefine'''
* OpenRefine
* '''Specify:'''
* Specify
* '''Symbiota:'''
* Symbiota
* '''TaxonWorks:'''
* TaxonWorks


=== Missing Latitudes/Longitudes ===
=== Missing Latitudes/Longitudes ===
Line 76: Line 76:


'''How to FIND this Problem in Your Dataset:'''
'''How to FIND this Problem in Your Dataset:'''
* '''Arctos:'''
* Arctos
* '''Excel:'''
* Excel
* '''OpenRefine'''
* OpenRefine
* '''Specify:'''
* Specify
* '''Symbiota:''' Use the [https://biokic.github.io/symbiota-docs/editor/edit/ Record Search form]. For Custom Field 1, select Decimal Latitude IS NULL. For Custom Field 2, select Decimal Longitude IS NOT NULL. Then conduct a similar search with Decimal Latitude IS NOT NULL and Decimal Longitude IS NULL.
* Symbiota
* '''TaxonWorks:'''
* TaxonWorks


'''How to FIX this Problem in your Dataset:'''
'''How to FIX this Problem in your Dataset:'''
* '''Arctos:'''
* Arctos
* '''Excel:'''
* Excel
* '''OpenRefine'''
* OpenRefine
* '''Specify:'''
* Specify
* '''Symbiota:''' No batch fixing possible. You will need to review the records and either add lat/long values or remove the orphaned lat/long values.
* Symbiota
* '''TaxonWorks:'''
* TaxonWorks


=== Misspelled Geographic Unit Names ===
=== Misspelled Geographic Unit Names ===
Line 95: Line 95:


'''How to FIND this Problem in Your Dataset:'''
'''How to FIND this Problem in Your Dataset:'''
* '''Arctos:'''
* Arctos
* '''Excel:'''
* Excel
* '''OpenRefine'''
* OpenRefine
* '''Specify:'''
* Specify
* '''Symbiota:''' Use the [https://biokic.github.io/symbiota-docs/coll_manager/data_cleaning/geography/ Geography Cleaning Tools]
* Symbiota
* '''TaxonWorks:'''
* TaxonWorks


'''How to FIX this Problem in your Dataset:'''
'''How to FIX this Problem in your Dataset:'''
* '''Arctos:'''
* Arctos
* '''Excel:'''
* Excel
* '''OpenRefine'''
* OpenRefine
* '''Specify:'''
* Specify
* '''Symbiota:''' Use the [https://biokic.github.io/symbiota-docs/coll_manager/data_cleaning/geography/ Geography Cleaning Tools]
* Symbiota
* '''TaxonWorks:'''
* TaxonWorks


== Taxonomy ==
== Taxonomy ==
Line 116: Line 116:


'''How to FIND this Problem in Your Dataset:'''
'''How to FIND this Problem in Your Dataset:'''
* '''Arctos:'''
* Arctos
* '''Excel:'''
* Excel
* '''OpenRefine'''
* OpenRefine
* '''Specify:'''
* Specify
* '''Symbiota:''' Use the [https://biokic.github.io/symbiota-docs/coll_manager/data_cleaning/taxonomy/ Taxonomic Cleaning Tool]
* Symbiota
* '''TaxonWorks:'''
* TaxonWorks


'''How to FIX this Problem in your Dataset:'''
'''How to FIX this Problem in your Dataset:'''
* '''Arctos:'''
* Arctos
* '''Excel:'''
* Excel
* '''OpenRefine'''
* OpenRefine
* '''Specify:'''
* Specify
* '''Symbiota:''' Use the [https://biokic.github.io/symbiota-docs/coll_manager/data_cleaning/taxonomy/ Taxonomic Cleaning Tool]
* Symbiota
* '''TaxonWorks:'''
* TaxonWorks

Revision as of 13:51, 16 February 2024


Overview

This page was created to aggregate common data quality issues and potential solutions to those issues in collection management systems and CMS-agnostic tools. Data quality issues are grouped into data categories, and tutorials are provided for (1) identifying and (2) fixing the issues.

This page was inspired by Bob Mesibov's Data Cleaner's Cookbook.

Catalog Numbers and Other Identifiers

Duplicate Catalog Numbers

Problem: The same catalog number is used multiple times within your dataset. (This problem may or may not be intentional, depending on your collection's policies. It is generally best to not duplicate catalog numbers, when possible).

How to FIND this Problem in Your Dataset:

  • Arctos
  • Excel
  • OpenRefine
  • Specify
  • Symbiota
  • TaxonWorks

How to FIX this Problem in your Dataset:

  • Arctos
  • Excel
  • OpenRefine
  • Specify
  • Symbiota
  • TaxonWorks

Dates

Identified Date Earlier than Collected Date

Problem: The date the specimen was identified (dateIdentified field) is earlier than the date the specimen was collected (eventDate).

How to FIND this Problem in Your Dataset:

  • Arctos
  • Excel
  • OpenRefine
  • Specify
  • Symbiota
  • TaxonWorks

How to FIX this Problem in your Dataset:

  • Arctos
  • Excel
  • OpenRefine
  • Specify
  • Symbiota
  • TaxonWorks

Geography

Improperly Negated Latitudes/Longitudes

Problem: The sign of the latitude (decimalLatitude) or longitude (decimalLongitude) does not match the sign/hemisphere of the given country. For example, all longitudes in the U.S. should be negative.

How to FIND this Problem in Your Dataset:

  • Arctos
  • Excel
  • OpenRefine
  • Specify
  • Symbiota
  • TaxonWorks

How to FIX this Problem in your Dataset:

  • Arctos
  • Excel
  • OpenRefine
  • Specify
  • Symbiota
  • TaxonWorks

Missing Latitudes/Longitudes

Problem: A record has a latitude value, but not a longitude value.

How to FIND this Problem in Your Dataset:

  • Arctos
  • Excel
  • OpenRefine
  • Specify
  • Symbiota
  • TaxonWorks

How to FIX this Problem in your Dataset:

  • Arctos
  • Excel
  • OpenRefine
  • Specify
  • Symbiota
  • TaxonWorks

Misspelled Geographic Unit Names

Problem: The geographic units (e.g., country, state, county) are misspelled, resulting in poor matching of geographic unit names to existing geographic lists.

How to FIND this Problem in Your Dataset:

  • Arctos
  • Excel
  • OpenRefine
  • Specify
  • Symbiota
  • TaxonWorks

How to FIX this Problem in your Dataset:

  • Arctos
  • Excel
  • OpenRefine
  • Specify
  • Symbiota
  • TaxonWorks

Taxonomy

Misspelled Taxonomic Names

Problem: Scientific names are misspelled, resulting in poor matching of taxonomic names to taxonomic databases.

How to FIND this Problem in Your Dataset:

  • Arctos
  • Excel
  • OpenRefine
  • Specify
  • Symbiota
  • TaxonWorks

How to FIX this Problem in your Dataset:

  • Arctos
  • Excel
  • OpenRefine
  • Specify
  • Symbiota
  • TaxonWorks