Transcription Hackathon: Difference between revisions

From iDigBio
Jump to navigation Jump to search
No edit summary
Line 14: Line 14:
*[https://www.idigbio.org/wiki/index.php/Transcription_Hackathon_Interoperability_Planning Interoperability Track]  
*[https://www.idigbio.org/wiki/index.php/Transcription_Hackathon_Interoperability_Planning Interoperability Track]  
*[https://www.idigbio.org/wiki/index.php/Transcription_Hackathon_OCR_Integration_Planning OCR Integration Track]  
*[https://www.idigbio.org/wiki/index.php/Transcription_Hackathon_OCR_Integration_Planning OCR Integration Track]  
*[https://www.idigbio.org/wiki/index.php/Transcription_Hackathon_Reconciliation_of_Replicates_Planning Reconciliation of Replicates Track]  
*[https://www.idigbio.org/wiki/index.php/Transcription_Hackathon_Reconciliation_of_Replicates_Planning QA/QA and Reconciliation of Replicates Track]  
*[https://www.idigbio.org/wiki/index.php/Transcription_Hackathon_User_Engagement_Planning User Engagement Track]
*[https://www.idigbio.org/wiki/index.php/Transcription_Hackathon_User_Engagement_Planning User Engagement Track]



Revision as of 09:18, 5 December 2013

Notes from Nature/iDigBio Hackathon to Further Enable Public Participation in the Online Transcription of Biodiversity Specimen Labels

December 16–20 at the University of Florida, Gainesville

Agenda and Logistics

Coordination

Development Resources

  • Existing crowdsourcing datasets from Notes From Nature: datasets with transcriptions of different types of collections labels:
    • Herbarium labels: link to be provided by NfN or Austin
    • Entomology labels: link to be provided by NfN or Austin
    • Field notebooks: link to be provided by NfN or Austin
  • Existing solution datasets to assess quality of crowdsourcing consensus:
    • Herbarium labels ideal response: link to be provided by Austin
    • Entomology labels ideal response: link to be provided by Austin
    • Field notebooks ideal response: link to be provided by Austin
  • CYWG iDigBio Image Ingestion Appliance:
    • The appliance can be used to ingest the images to be used by the crowdsourcing service into the iDigBio storage, and made publicly accessible through HTTP. The relationship between the image filenames and the URL can be exported by the appliance in CSV format.
  • Code from the aOCR Hackthon:
    • HandwritingDetection (https://github.com/idigbio-aocr): an algorithm that separates images into sets with no handwriting, little handwriting (mostly text typed or printed), lots of handwriting, based on the noise generated by the OCR software. Read more at Ben's blog. This could be used to rank which images are in more need for human transcription.
    • Dictionaries to improve crowdsourcing consensus (e.g., names of collectors, scientific names): link to be provided by aOCR?