New Article Published on Crowdsourcing Transcription in e-Science

Mon, 2014-11-10 10:35 -- maphillips
Release Date: 
Monday, November 10, 2014


Andréa Matsunaga, Austin Mast, and José Fortes recently published an article in the proceedings of the 10th IEEE International conference on e-Science. The article, Reaching Consensus in Crowdsourced Transcription of Biocollections Information, shows methods for reaching consensus from crowdsourcing output for transcribing collections labels. The amount of information associated with biological collections is staggering, and the data are becoming increasingly available because of digitization initiatives fueled by the NSF's ADBC (Advancing Digitization of Biodiversity Collections) program.

The challenge facing collections and researchers now is how to efficiently digitize and transcribe all this specimen data. Digitized materials that need transcription include:


  • Catalogs
  • Original field notes
  • Specimen labels
  • Herbarium sheets



Crowdsourcing is an appealing solution for transcription services for digitized collection materials because crowdsourcing provides:

  •  Lower costs
  • Increased production (compared with transcription rates of collection staff alone)
  •  Human intelligence (instead of relying on OCR and parsers)
  •  Interaction between collections and the public

The challenge when using crowdsourcing is that workers sometimes make errors. Many projects combat this by building redundancy into their project workflow. Redundancy increases the reliability of the data.

Reaching Consensus in Crowdsourced Transcription of Biocollections Information deals with the issue of trying to reach consensus when two (or more) workers transcribe the same data differently. The article:

  • Describes the influence of string comparison algorithms on reaching consensus using transcribed and interpreted data using a popular crowdsourcing platform, Notes from Nature, as an example.
  • Gives a strategy to produce a consensus response with excellent accuracy.
  • Suggests a controller that minimizes the number of workers required for a particular task.
  • Gives suggestions for improving the design of crowdsourcing tasks.

The paper will be available soon electronically through the IEEE Digital Library.

Want to learn more?