Data Ingestion Guidance: Difference between revisions

Line 131: Line 131:
===Data recommendations for optimal searchability and applicability in the aggregate===
===Data recommendations for optimal searchability and applicability in the aggregate===
Optimizing the search experience means that data need to be as consistent and regular as possible. To that end, iDigBio constructs an index layer to accompany your as-offered 'raw' data. The results of that index-building exercise are reflected in the data quality flag report that accompanies every ingested dataset. When taxon ranks are missing, the scientific name is matched to the GBIF backbone [http://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c GBIF backbone taxonomy] and when an exact or fuzzy match is found, it is used as the authority to fill in and regularize the taxonomic information in the portal specimen record.
Optimizing the search experience means that data need to be as consistent and regular as possible. To that end, iDigBio constructs an index layer to accompany your as-offered 'raw' data. The results of that index-building exercise are reflected in the data quality flag report that accompanies every ingested dataset. When taxon ranks are missing, the scientific name is matched to the GBIF backbone [http://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c GBIF backbone taxonomy] and when an exact or fuzzy match is found, it is used as the authority to fill in and regularize the taxonomic information in the portal specimen record.
We support and encourage you to use the GBIF recommended set of occurrence record fields found here: http://bid.gbif.org/en/community/data-quality/#occurrence. We don't have a long list of required fields (occurrenceID, basisOfRecord, institutionCode, scientificName, kingdom), but we strongly recommend that you address as many of these below as possible.
We support and encourage you to use the GBIF recommended set of occurrence record fields found here: http://bid.gbif.org/en/community/data-quality/#occurrence. We don't have a long list of required fields (occurrenceID, basisOfRecord, institutionCode, scientificName, kingdom), but we strongly recommend that you address as many of these fields below as possible, along with a careful reading of the GBIF BID document.
====Data ownership====
====Data ownership====
*'''[[institutionCode]]''' and '''ownerInstitutionCode''': we recommend that if you use ownerInstitutionCode in your data that you also fill in institutionCode. The former is typically used to indicate that the specimen is at location 'x' while the record is being provided by institution 'y'. While we do not require the use of institutionCode, it is likely to be the most agreed upon searchable information when thinking about the disparities in a precise institution name. Use it consistently in your occurrence records and follow the Index Herbariorum or the ASIH codes.
*'''[[institutionCode]]''' and '''ownerInstitutionCode''': we recommend that if you use ownerInstitutionCode in your data that you also fill in institutionCode. The former is typically used to indicate that the specimen is at location 'x' while the record is being provided by institution 'y'. While we do not require the use of institutionCode, it is likely to be the most agreed upon searchable information when thinking about the disparities in a precise institution name. Use it consistently in your occurrence records and follow the Index Herbariorum or the ASIH codes.
5,887

edits