Data Ingestion Guidance: Difference between revisions

Line 133: Line 133:


===Data recommendations for optimal searchability and applicability in the aggregate===
===Data recommendations for optimal searchability and applicability in the aggregate===
Optimizing the search experience means that data need to be as consistent and regular as possible. To that end, iDigBio constructs an '''index layer''' to accompany your as-offered 'raw' data. The results of that index-building exercise are reflected in the data quality flag report that accompanies every ingested dataset. When taxon ranks are missing, the scientific name is matched to the GBIF backbone [http://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c GBIF backbone taxonomy] and when an exact or fuzzy match is found, it is used as the authority to fill in and regularize the taxonomic information in the portal specimen record.
Optimizing the search experience means that data need to be as consistent and regular as possible. To that end, iDigBio constructs an '''index layer''' to accompany your as-offered 'raw' data. The results of that index-building exercise are reflected in the data quality flag report that accompanies every ingested dataset. The scientific name is matched to the GBIF backbone [http://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c GBIF backbone taxonomy] to correct typos and older names. When an exact or fuzzy match is found, it is used as the authority to fill in and regularize the taxonomic information in the index layer of the specimen record. Kingdom, when provided, is used to stop shifting to a different kingdom in the event that the given rank and scientific name is forcing a change. If not enough clues are found an identification can land in a completely different place in the taxonomy tree that the provider intended.  We encourage providers to supply GBIF with lists and corrections to help then keep the backbone up to date.
We support and encourage you to use the GBIF recommended set of occurrence record fields found here: http://www.gbif.org/publishing-data/quality. We don't have a long list of required fields ([[occurrenceID]], [[institutionCode]], [[scientificName]], [[kingdom]], [[taxonRank]], [[basisOfRecord]]), but we strongly recommend that you address as many of these fields below as possible. See below for further '[https://www.idigbio.org/wiki/index.php/Data_Ingestion_Guidance#Taxonomy Taxonomy]' information.
We support and encourage you to use the GBIF recommended set of occurrence record fields found here: http://www.gbif.org/publishing-data/quality. We don't have a long list of required fields ([[occurrenceID]], [[institutionCode]], [[scientificName]], [[kingdom]], [[taxonRank]], [[basisOfRecord]]), but we strongly recommend that you address as many of these fields below as possible. See below for further '[https://www.idigbio.org/wiki/index.php/Data_Ingestion_Guidance#Taxonomy Taxonomy]' information.
====Data ownership====
====Data ownership====
5,887

edits