Transcription Hackathon

From iDigBio
Revision as of 17:04, 20 December 2013 by Snomelf (talk | contribs)
Jump to navigation Jump to search


Notes from Nature/iDigBio Hackathon to Further Enable Public Participation in the Online Transcription of Biodiversity Specimen Labels

December 16–20 at the University of Florida, Gainesville

Agenda and Logistics

Coordination

Presentations

Development Resources

  • <a href="https://github.com/idigbio-citsci-hackathon">GitHub organization for this Transcription Hackathon</a>
  • 4 existing crowdsourcing datasets from Notes From Nature. Datasets contain transcriptions of different types of collections labels. Read more <a href="https://docs.google.com/document/d/1UCz5WblnNIvqBErX-XeWgS9mf69qFhycHqntQOGnPp4/edit?usp=sharing">here</a>. The datasets were shared only with the hackaton participants through dropbox once anonymized. It will be made public when we get a definitive approval from NfN.
    • Calbug dataset
    • Herbarium labels—The filenames with "USAM_" represent a nearly complete set of recent transcriptions from a collection (the University of South Alabama Herbarium), four replicates for most specimens (I think).
    • Macrofungi labels
    • Ornithological dataset
  • Existing solution datasets to assess quality of crowdsourcing consensus (we are working to get "gold standard" data for some of these:
    • Herbarium labels ideal response: link to be provided by Austin
    • Entomology labels ideal response: link to be provided by Austin
    • Field notebooks ideal response: link to be provided by Austin
  • For those interested in experimenting with the images that have been used for public participation in transcription:
    • Herbarium label images: the set of ca. 16,000 "USAM" images used for some of the herbarium transcriptions is available at <a href="http://www.specimenimaging.com/images/USAM/">USAM Herbarium Images</a>. This is several GB worth of image files. To get them, you could use the DownloadThemAll Firefox plugin.
  • <a _fcknotitle="true" href="CYWG iDigBio Image Ingestion Appliance">CYWG iDigBio Image Ingestion Appliance</a>:
    • The appliance can be used to ingest the images to be used by the crowdsourcing service into the iDigBio storage, and made publicly accessible through HTTP. The relationship between the image filenames and the URL can be exported by the appliance in CSV format.


Hackathon Products

<a _fcknotitle="true" href="Category:Transcription_Hackathon">Transcription_Hackathon</a>