Data Ingestion Guidance: Difference between revisions

no edit summary
m (Reverted edits by Dstoner (talk) to last revision by Joanna)
No edit summary
Line 199: Line 199:
==Packaging for images / media objects - identifiers==
==Packaging for images / media objects - identifiers==
Consult iDigBio's media policy: https://www.idigbio.org/content/idigbio-image-file-format-requirements-and-recommendations-1 while preparing your media.
Consult iDigBio's media policy: https://www.idigbio.org/content/idigbio-image-file-format-requirements-and-recommendations-1 while preparing your media.
*Firstly, adding a field in the occurrence file for ''associatedMedia'' is not the way to include media with a specimen record. Media that comes to us via this method, or embedded in a webpage will not get the usual handling.
*Firstly, adding a field in the occurrence file for ''associatedMedia'' is the least robust method to relate media with a specimen record and iDigBio discourages this practice. Media that is provided with sufficient metadata will be more useful for downstream users and receive additional handling such as thumbnails in the iDigBio portal. In the future, media may be searchable in iDigBio based on the provided media metadata.
*Each media record should have a unique (within the dataset) identifier in the ''dc:identifier'' field.
*Each media record should have a unique and persistent identifier in the column defined by http://purl.org/dc/terms/identifier
*Columns are defined in the meta.xml so the column headers in the multimedia file itself are a convenience but not actually significant to the meaning or processing of the column.
 
A pristine sample of a minimally-populated AC CSV published via an extension in a Darwin Core Archive:
 
<pre>coreid,identifier,type,format,accessURI,rights,owner,creator,metadataLanguage
e24899c2-f13a-4d51-8733-bdf666b390d9,urn:uuid:32e5da5d-c747-435c-a368-07d989259bf4,StillImage,image/jpeg,http://example.com/00000001,https://creativecommons.org/publicdomain/zero/1.0/,Museum of the USA,John Smith,eng
</pre>
 
Another variation based on real-world data:
 
<pre>coreid,identifier,type,format,accessURI,rights,owner,creator,metadataLanguage
urn:catalog:MUSA:fish:123,32e5da5d-c747-435c-a368-07d989259bf4,StillImage,image/jpeg,http://example.com/IMAGES/00000001.jpg,CC0,Museum of the USA,John Smith,eng
</pre>
 
 
The columns are defined in the accompanying meta.xml:
 
<pre>
TBD
</pre>
 
 
*If submitting media records with specimen data records, here are the critical fields to fill in:
*If submitting media records with specimen data records, here are the critical fields to fill in:
** sample of fully-populated AC record
**'''coreid''' - If media data are being provided via an extension, the coreid field in the Audubon Core extension file is what links the media record to the specimen record. "coreid" is not a term defined by Darwin Core or Audubon Core. The value in the extension coreid column will link to a value in the core file "id" column (normally column 0). Examples: <pre>urn:catalog:institutionCode:collectionCode:catalogNumber</pre><pre>urn:uuid:32e5da5d-c747-435c-a368-07d989259bf4</pre><pre>123456</pre>
***'''id (dc:identifier)''' = (this is the coreid field in the Audubon Core extension file), it matches one identifier among the related specimen records <pre>urn:catalog:institutionCode:collectionCode:catalogNumber</pre>
**'''identifier''' ([http://purl.org/dc/terms/identifier dcterms:identifier] or [http://purl.org/dc/elements/1.1/identifier dc:identifier]) = The persistent and unique id of the media record within the Audubon Core file. It may be tempting to use the URL of the media as the identifier. However, we have seen multiple cases where media have moved, making the identifier not persistent. If you have multiple types of identifiers for a media, put the least stable here and the most stable in ac:providerManagedID.  Examples: <pre>urn:uuid:84fb24fa-fd15-476a-99a6-a7f876b87d08</pre>
***'''identifier (dc:identifier)''' = id of the media record - needs to be persistent and unique within Audubon Core file. It may be tempting to use the URL of the media as the identifier. However, we have seen multiple cases where media have moved, making the identifier not persistent. If you have multiple types of identifiers for a media, put the least stable here and the most stable in ac:providerManagedID <pre>urn:uuid:84fb24fa-fd15-476a-99a6-a7f876b87d08</pre><pre>123456</pre>
**'''format''' ([http://purl.org/dc/elements/1.1/format dc:format]) = Media Type / MIME Type (from http://www.iana.org/assignments/media-types/media-types.xhtml controlling vocabulary if possible). Examples: <pre>image/jpeg</pre><pre>audio/mpeg</pre>
***'''format (dc:format)''' = Media Type / MIME Type (from http://www.iana.org/assignments/media-types/media-types.xhtml controlling vocabulary if possible) <pre>image/jpeg</pre>
**'''accessURI''' ([http://rs.tdwg.org/ac/terms/accessURI ac:accessURI]) = direct http link to the media file. Note that the media type (format) *must* match the media type of the resource at the target end of this accessURI. For example, if the format is "image/jpeg" then accessURI '''must''' link to an image, not a web page. Examples: <pre>http://example.com/IMAGES/00000001.jpg</pre><pre>http://example.com/objects/987654321</pre>
***'''accessURI (ac:accessURI)''' = direct http link to the media file. Note that the media type (format) *must* match the media type of the resource at the target end of this accessURI. For example, if the format is "image/jpeg" then accessURI '''must''' link to an image, not a web page.<pre>http://example.com/IMAGES/00000001.jpg</pre>
**'''providerManagedID''' ([http://rs.tdwg.org/ac/terms/providerManagedID ac:providerManagedID]) =  (Optional) If you have a stable UUID GUID for your media records and you have populated "dc:identifier" with a different type of identifier, place the guid in the optional ac:providerManagedID field. Examples: <pre>urn:uuid:32e5da5d-c747-435c-a368-07d989259bf4</pre>
***'''providerManagedID (ac:providerManagedID)''' if you have a stable UUID GUID for your media records and you have populated "dc:identifier" with a different type of identifier, place the guid in the optional ac:providerManagedID field. <pre>urn:uuid:32e5da5d-c747-435c-a368-07d989259bf4   (optional)</pre>
 
Note: dc:terms format and dc:type should match the type of the object returned by ac:accessURI (If ac:accessURI is not present, dc:terms format and dc:type should not be present either), especially in the case where ac:furtherInformationURL is used as an alternative to ac:accessURI.
'''Note:''' dc:terms format and dc:type should match the type of the object returned by ac:accessURI (If ac:accessURI is not present, dc:terms format and dc:type should not be present either), especially in the case where ac:furtherInformationURL is used as an alternative to ac:accessURI.  Media embedded on a webpage is a considered a webpage and thus will not be treated as media.  accessURI should point to the media itself.




1,554

edits