Transcription Hackathon Reconciliation of Replicates Planning

From iDigBio
Jump to navigation Jump to search

We worked on tools to help with reconciling and interpreting crowd-sourced data. One possible workflow might go like this:

   Start with crowd-sourced transcriptions.
   → reconcile ( → filter out irreconcilables?)
   if locality:
       → place name matching
       → geocoding
   if names:
       → name splitting
       → name list lookup

Reconciliation: Range of approaches:

  • Get a super-user to finalize / approve transcriptions, instead of trying to resolve multiple submissions
  • Or, given multiple transcriptions, pick one which minimizes some edit distance.
  • Or, use sequence alignment tools to find the best transcription of subregions in a larger string. (GitHub code does this.)

Locality: Again, a range, but probably want to try to clean up the transcribed string before going to geocoding service.

Names: Processing will depend on target database structure: Maybe you just want one string, or maybe you want to try to separate names. If the names are separated, they could be compared/linked to an outside list of collectors. (... and that could be part of a larger QA process: Does the collection date make sense, given the life span of the collector?) (GitHub code tries to do this.)

Older documents

Back to Transcription_Hackathon