Transcription Hackathon Reconciliation of Replicates Planning

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

We worked on tools to help with reconciling and interpreting crowd-sourced data. One possible workflow might go like this:

   Start with crowd-sourced transcriptions.
   → reconcile ( → filter out irreconcilables?)
   if locality:
       → place name matching
       → geocoding
   if names:
       → name splitting
       → name list lookup

Reconciliation: Range of approaches:

  • Get a super-user to finalize / approve transcriptions, instead of trying to resolve multiple submissions
  • Or, given multiple transcriptions, pick one which minimizes some edit distance.
  • Or, use sequence alignment tools to find the best transcription of subregions in a larger string. (GitHub code does this.)

Locality: Again, a range, but probably want to try to clean up the transcribed string before going to geocoding service.

Names: Processing will depend on target database structure: Maybe you just want one string, or maybe you want to try to separate names. If the names are separated, they could be compared/linked to an outside list of collectors. (... and that could be part of a larger QA process: Does the collection date make sense, given the life span of the collector?) (GitHub code tries to do this.)

Older documents

Back to Transcription_Hackathon