Input CSV Format

From iDigBio
Jump to: navigation, search

Input CSV Format for iDigBio Media Appliance

The appliance will build an input CSV file by scanning a specified directory for media ("Generate CSV"). This is the most common workflow for the media appliance.

Advanced users may wish to create their own input CSV. The following tables describe the required input file format.

Column Name Required Description Example
idigbio:recordID or idigbio:MediaGUID Y Identifier for the record / row in the input file. This must be a true GUID without a prefix. "8208561b-49fb-42a1-95c4-750dbb579174"
idigbio:OriginalFileName or ac:accessURI Y For idigbio:OriginalFileName, the entire path+filename as stored on the appliance user's local disk in OS / filesystem syntax. "C:\images\AMNH_PBI\AMNH_PBI 00010147 lateral.jpg", "/home/Plants/VSC0043330.jpg"
For ac:accessURI, the entire path+filename as stored on the appliance user's local disk in URI syntax. "file:///C:/images/AMNH_PBI/AMNH_PBI 00010147 lateral.jpg"
dc:format Y Recommended best practice is to use a controlled vocabulary such as the list of Internet Media Types [MIME] "image/jpeg", "audio/mpeg"

Additional media metadata such as Audubon Core (AC) terms can optionally be included in the input file. Some recommended fields are given below.

Column Name Required Description Example
dcterms:identifier N GUID for a digital multimedia object that normally originates in the source database. "urn:uuid:3c1dc496-e5c6-4849-b616-cada2896190d", " 00010147 lateral.jpg"
dcterms:title N Concise title of individual resource herbarium sheet of Abarema abbottii (Rose & Leonard) Barneby & J.W.Grimes
ac:associatedSpecimenReference N A reference to the occurrenceID of a specimen associated with this resource. 0e1e12ed-2261-42db-8719-ee98532dab06
dc:rights or dcterms:rights N dc:rights - “CC BY-NC"

dcterms:rights -

preferred - dcterms:rights
xmpRights:Owner N New York Botanical Garden A list of the names of the owners of the copyright (the one in the dc:rights field). 'Unknown' is an acceptable value, but 'Public Domain' is not.
ac:licenseLogoURL N
dc:creator N "New York Botanical Garden" or "Jane Doe, Digital Media Manager, New York Botanical Garden" The person or organization responsible for creating the media resource, might be less encompassing than what is in xmpRights:Owner.
dc:type N StillImage, Sound, MovingImage
dc:subtype N Photograph

The following information is deprecated and is being maintained for historical reference only. Use Audubon Core terms to provide additional media metadata instead.

Column Name Required Description Example
idigbio:CollectionObjectGUID N Relates the media record identified by idigbio:MediaGUID to a specimen record (CollectionObject concept). "", "urn:uuid:3c1dc496-e5c6-4849-b616-cada2896190d"
idigbio:Title N Concise title, name, or brief descriptive label of individual resource. This field should include the complete title with all the subtitles, if any. "Ilex glabra from FSU"
idigbio:Description N Description of the individual resource, containing the Who, What, When, Where and Why as free-form text. "Scanned herbarium sheet with specimen collected West of Plant City 4 miles from Mango Jct., on Hwy 92."
idigbio:LanguageCode N Code for the language used in the title and description. Must be in ISO 639-1 format. "en", "es", "pt"
idigbio:DigitizationDevice N Free form text describing the device or devices used to create the resource. "Canon Supershot 2000","Makroscan Scanner 2000","Zeiss Axioscope with Camera IIIu","SEM (Scanning Electron Microscope)"
idigbio:NominalPixelResolution N The real size of the pixel depicted in the image (e.g., microscopy). Include a number and a unit. "128µm"
idigbio:Magnification N Magnification applied when capturing the image of an object. "4x", "100x"
idigbio:OcrOutput N Output of the process of applying OCR to the multimedia object. "\tThe New York Botanical Garden\n\tLICHENS of NEW YORK STATE, U.S.A.\n\n Polycoccum minutulum Kocourkov� & F. Berger\n\n on Trapelia placodioides Coppins & P. James\n\nRockland County: Harriman State Park, along\n Woodtown Road West near dam at S end of Lake\n Sebago along Seven Lakes Drive, 41�11'N, 74�08'W,\n ca. 240 m; mixed hardwood-hemlock forest with\n granitic erratics.\n\n19 April 1998\n\nRichard C. Harris 42164\tNEW YORK BOTANICAL GARDEN\n\n\t\t\t\t\t01075759\n"
idigbio:OcrTechnology N Free form text describing the software utilized for OCR as well as any additional technique (cropping, color alteration applied, controlled vocabulary). "Tesseract version 3.01 on Windows, latin character set"
idigbio:InformationWithheld N Indication that additional information exists and that it has not been shared in the given record due to sensitive nature. It does not contain the withheld information itself. Should include information on how to obtain the withheld information by other means (e.g., a contact). "location information not given for endangered species, contact my@email", "collector identities withheld, contact xyz", "ask about tissue samples by contacting my@email"

Invalid input CSV file
Image ingestion appliance checks the validity of input CSV file. Below list is the cases of invalid input CSV files.
- The number of columns differs among rows.
- Any entity contains double quotation mark(") in the field.

Specimen Record UUID Field
To support the relationship between media records and specimens, users can provide the field "idigbio:SpecimenRecordUUID" in the csv file.
Values of this field should be UUIDs of the specimen records. Multiple UUIDs can be specified for one media record, separated by ",".
For example:

Examples of the input CSV files: Example 1 Example 2 Example 3