iDigBio Data Ingestion Requirements and Guidelines

Thu, 2013-05-30 15:04 -- godfoder

This guide describes the formats and requirements for ingesting data into iDigBio.

Supported File Formats: iDigBio strives to make data ingestion into our infrastructure as easy as possible. To achieve this, we have identified two lowest common denominator export file formats that we will initially support for dataset ingestion.

The first of these formats is CSV, which is available as an export format requiring very little work from most databasing software.  When using CSV files, care must be taken with free-form text fields to ensure that all line breaks and quotes are escaped, and all commas within fields are enclosed inside a quoted field. Give a visual inspection to the exported file to verify that any diacritics in the data have been preserved, preferably by encoding to UTF-8.

The second format is the Darwin Core Archive file.

Topics in the guide include:

  • Supported File Formats
  • Data Export Process
  • Getting Your Data to iDigBio
  • Appendix A - Darwin Core Archive Processing
  • Appendix B - CSV Processing
  • Appendix C - Helpful Links for working with IPT and Darwin Core Archives
Standards Documentation: