Excel Data Quality Toolkit: Difference between revisions

Line 163: Line 163:
'''Problem:''' The geographic units (e.g., [https://dwc.tdwg.org/terms/#dwc:country country], [https://dwc.tdwg.org/terms/#dwc:stateProvince state/province], [https://dwc.tdwg.org/terms/#dwc:county county]) are misspelled, resulting in poor matching of geographic unit names to existing geographic lists.
'''Problem:''' The geographic units (e.g., [https://dwc.tdwg.org/terms/#dwc:country country], [https://dwc.tdwg.org/terms/#dwc:stateProvince state/province], [https://dwc.tdwg.org/terms/#dwc:county county]) are misspelled, resulting in poor matching of geographic unit names to existing geographic lists.


'''Solution:'''
'''Solution:''' Avoid this issue with a pick list as data are entered. For a tutorial see [https://support.microsoft.com/en-us/office/create-a-drop-down-list-7693307a-59ef-400a-b769-c5402dce407b Create a drop-down list]
 
If data is already entered, check for non-standard values with this formula Where A1 is the geographic unit value being tested and the list to test against is on the geographic unit tab in cells A1 through A11. Any value in column A that does not match a value in the list to test against will result in #N/A.
 
'''WARNING!''' Note the '''$''' in front of the letters and numbers that comprise the list to test against. Leaving these off may result in false errors as the test will run against blank cells if the formula is copied past row 11 (or whatever the last row number is in the list to test against), so make sure those $ are in place!
 
=VLOOKUP(A1,'geographic unit'!$A$1:$A$11,1,FALSE)
 
This same process can be used to solve issues. Get a list of unique values in your geographic unit column by copying the entire column to column A in a new tab, highlight the copied data and from the main Excel menu select Data->Remove duplicates. In column B add the correct scientific name that should be used for every term in column A even if the two are the same.
 
Use a LOOKUP like the one above to get the correct scientific name for every row in your file.
 
=VLOOKUP(A1,'unique geographic unit'!$A$1:$B$11,2,FALSE)
 
Note that the unique values to check are in column A of the unique geographic unit tab and the correct replacement values are in column B of the unique geographic unit tab. Also note that the value being returned is from column B indicated as the 2nd column in the array.


== Taxonomy ==
== Taxonomy ==
83

edits