Data Ingestion Guidance: Difference between revisions

Jump to navigation Jump to search
Line 133: Line 133:


===Data recommendations for optimal searchability and applicability in the aggregate===
===Data recommendations for optimal searchability and applicability in the aggregate===
We optimize the search experience to make data as consistent and regular as possible. To that end, iDigBio constructs an '''index''' to accompany your as-offered 'raw' data. The results of that index-building exercise are reflected in the data quality flag report that accompanies every ingested dataset. The ''scientific name'' is matched to the GBIF backbone [http://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c GBIF backbone taxonomy] to correct typos and older names. When an exact or fuzzy match is found, it is used as the authority to fill in and regularize the taxonomic information in the index layer of the specimen record. ''Kingdom'', when provided, is used to stop shifting to a different kingdom in the event that the given rank and scientific name would force a change. If not enough clues are found, an identification can land in a completely different place in the taxonomy tree that the provider intended.  We encourage providers to supply GBIF with lists and corrections to help GBIF keep the backbone up to date.
We optimize the search experience to make data as consistent and regular as possible. To that end, iDigBio constructs an '''index''' to accompany your as-offered 'raw' data. The results of that index-building exercise are reflected in the data quality flag report that accompanies every ingested dataset. The ''scientific name'' is matched to the GBIF backbone [http://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c GBIF backbone taxonomy] to correct typos and older names. When an exact or fuzzy match is found, it is used as the authority to fill in and regularize the taxonomic information in the indexed version of the specimen record. ''Kingdom'', when provided, is used to stop shifting to a different kingdom in the event that the given rank and scientific name would force a change. If not enough clues are found, an identification can land in a completely different place in the taxonomy tree that the provider intended.  We encourage providers to supply GBIF with lists and corrections to help GBIF keep the backbone up to date.
We support and encourage you to use the GBIF recommended set of occurrence record fields found here: http://www.gbif.org/publishing-data/quality. We don't have a long list of required fields ([[occurrenceID]], [[institutionCode]], [[scientificName]], [[kingdom]], [[taxonRank]], [[basisOfRecord]]), but we strongly recommend that you address as many of these fields below as possible. See below for further '[https://www.idigbio.org/wiki/index.php/Data_Ingestion_Guidance#Taxonomy Taxonomy]' information.
We support and encourage you to use the GBIF recommended set of occurrence record fields found here: http://www.gbif.org/publishing-data/quality. We don't have a long list of required fields ([[occurrenceID]], [[institutionCode]], [[scientificName]], [[kingdom]], [[taxonRank]], [[basisOfRecord]]), but we strongly recommend that you address as many of these fields below as possible. See below for further '[https://www.idigbio.org/wiki/index.php/Data_Ingestion_Guidance#Taxonomy Taxonomy]' information.
====Data stewardship / ownership====
====Data stewardship / ownership====
5,887

edits

Navigation menu