Example of transformations on InvertNet image metadata dataset

From iDigBio
Jump to: navigation, search

Introduction


Here we describe the process of preparing the InvertNet TCN media metadata information for ingestion by iDigBio that took place through e-mail exchanges and meetings with the CYWG. InvertNet is currently focusing on imaging trays of vials and boxes of slides, which are sometimes grouped (i.e., a resource can be composed of multiple images; e.g., http://invertnet.org/resources/4923 is composed of 2 images).
Caution: The description of the AC metadata in this document applies to the standard from 2013. Later developments have changed what would be acceptable now, e.g., bestQualityAccessURI has been replaced by accessURI.

Structural Transformation


Since Audubon Core (AC) can represent only a single media object per row, and each InvertNet resource could contain multiple media objects, the first transformation needed was structural. Thus, instead of having one row per resource, one row per image was created. In terms of Globally Unique IDentifiers (GUIDs), this meant having the term 'dcterms:identifier' populated with an HTTP-based URI of the form:

https://invertnet.org/resources/<resource_#>/images-<image_#>

Instead of simply:

https://invertnet.org/resources/<resource_#>

Intellectual Property License Choice


Since InvertNet selected CC BY-NC-SA as the license for all images created by the TCN, this entailed adding the following required terms defined by the [[MISC-Authority-File-Working-Group | MISC WG] for Media:

  1. 'xmpRights:UsageTerms' containing value 'CC BY-NC-SA'
  2. 'xmpRights:WebStatement' containing value 'http://creativecommons.org/licenses/by-nc-sa/3.0/'
  3. 'ac:licenseLogoURL' containing 'http://mirrors.creativecommons.org/presskit/buttons/80x15/png/by-nc-sa.png'

And optionally:

  1. 'dcterms:rights' containing 'by-nc-sa'
  2. 'xmpRights:Owner' containing for example 'University of Illinois at Urbana-Champaign', 'Michigan State University', or 'Purdue University'.

For advice on the considerations choosing an appropriate license, look at the iDigBio Intellectual Property Policy.

Absence of specific specimen information


Since InverNet is currently not capturing specific specimen information, terms that relate the image to a specimen are not necessary, and were not included. These include 'coreid' and 'ac:associatedSpecimenReference'.

Mapping media information to standard AC terms


For media records, the most important information to be provided is the location of the actual image in its best quality, and optionally a thumbnail version, in the terms:

  1. 'ac:bestQualityAccessURI' containing a URL (e.g., 'https://invertnet.org/imagestor/vials/larges/2012/09/02/716515ddf5d642cf83506f1bf4b6a14c.jpg')
  2. 'ac:bestQualityFormat' with a valid MIME type (e.g., 'image/jpeg')
  3. 'ac:thumbnailAccessURI' containing a URL (e.g., 'https://invertnet.org/imagestor/vials/smalls/2012/09/02/716515ddf5d642cf83506f1bf4b6a14c.png')
  4. 'ac:thumbnailFormat' with a valid MIME type (e.g., 'image/png')

For additional information on acceptable media formats, look at the iDigBio Image File Format Requirements and Recommendations.

Mapping additional data to standard AC terms


Since InvertNet captures certain information about the specimens as tags and the capture process, further transformations and mappings performed in this dataset follows:

  1. 'dwc:scientificName' containing taxonomy of the specimens captured in the image (e.g., 'apantesis', 'arctiidae', or 'plecoptera')
  2. 'dwc:nameAccordingTo' containing 'Catalogue of Life'
  3. 'dcterms:title' containing the resource name in InvertNet
  4. 'dcterms:description' containing the type of capture (e.g., 'rack of vials')
  5. 'ac:tag' with all tags in InvertNet concatenated with a comma (e.g., 'University of Illinois at Urbana-Champaign, plecoptera, kbs', or 'Michigan State University, florida, Apantesis, placentia, Michigan, Clinton County, parthenice, New York, Westchester County, Shiawassee County, arge, Wyoming, Barry County')
  6. 'ac:subjectPart' indicating the part of the specimen imaged (e.g., 'Whole Vial Rack')
  7. 'ac:subjectOrientation' containing orientation of the specimen as it was imaged (e.g., 'Lateral Vial View', or 'Slide Top View')
  8. 'ac:bestQualityExtent' with the size of the best quality image in pixels (e.g., '3829x2246 pixels')
  9. 'ac:captureDevice' with information about the imaging device used (e.g., 'CanoScan 9950F')
  10. 'dcterms:creator' with the creator of the image (e.g., 'Paul Brooks', or 'Nick Barc')
  11. 'dcterms:type' with the type of media (e.g., 'StillImage')
  12. 'ac:metadataLanguage' with the natural language used in the record (e.g., 'en')
  13. 'dcterms:provider' containing for example 'University of Illinois at Urbana-Champaign,InvertNet'
  14. 'ac:metadataProvider' containing 'InvertNet.org'
  15. 'ac:metadataCreator' containing 'InvertNet'
  16. 'dcterms:available' with the date when the resource was made available in ISO 8601 format or 'YYYY-MM-DD hh:mm:ss" format.
  17. dcterms:modified' with the date when the resource was last modified in ISO 8601 format or 'YYYY-MM-DD hh:mm:ss" format.' with the date when the resource was made available in ISO 8601 format or 'YYYY-MM-DD hh:mm:ss" format.


Go back to: Data Ingestion Guidance

Go back to: CYWG page