Data Ingestion Guidance: Difference between revisions

From iDigBio
Jump to navigation Jump to search
No edit summary
Line 39: Line 39:
=== Sample Scenarios of Data Transformations to Prepare Data for Ingestion  ===
=== Sample Scenarios of Data Transformations to Prepare Data for Ingestion  ===


*https://www.idigbio.org/wiki/index.php/Example_of_trivial_transformations_on_INHS_fish_dataset
*[https://www.idigbio.org/wiki/index.php/Example_of_trivial_transformations_on_INHS_fish_dataset Example preparing specimen data from Illinois Natural History survey (INHS) fish collection from FileMakerPro]

Revision as of 07:35, 6 July 2013

Preliminary Guidance When First Considering iDigBio Data Ingestion:

Below are a few things that we ask of the data to make it fit for use in the cyberinfrastructure we are building

  • For all data records
  1. all specimen records need to have a GUID in each digital record: a persistent globally unique identifier
  2. you need to have ownership of the data in the case of your being its source, on the other hand if you are an aggregator, you need to have the owner's permission to send it to us.
  3. we would like it to be available to our harvester via IPT and RSS if possible, otherwise in DarwinCore format in a CSV file would work too.
  4. dates in ISO format, i.e., YYYY-MM-DD
  5. caution to preserve diacritics in people and place names.
  • For all images/media objects
  1. each media record needs to have a GUID: a persistent globally unique identifier
  2. we need there to be Audubon Core metadata file, with one record to go with each media record, and we can provide coaching to help you create that file. The more you can flesh out the details of the image, the more likely it will be to be highly retrievable.
  3. just like the ownership of catalog records, the media records need to provided freely and with permission, and each record needs to have at least Creative Commons permission = "CC BY"

The methods for linking the catalog records to the media records are in this document, as well as explanation about creating GUIDs for the records:

Data Ingestion Requirements and Guidelines are here:

Additional info about image format is here:

If you are contemplating writing a proposal (e.g., to NSF) and want to coordinate your data with iDigBio:

If you are brand new to iDigBio and looking for some entry-level info, try here:

Sample Scenarios of Data Transformations to Prepare Data for Ingestion