Transcription Hackathon Reconciliation of Replicates Planning
We worked on tools to help with reconciling and interpreting crowd-sourced data. One possible workflow might go like this:
Start with crowd-sourced transcriptions. → reconcile ( → filter out irreconcilables?) if locality: → place name matching → geocoding if names: → name splitting → name list lookup
Reconciliation: Range of approaches:
- Get a super-user to finalize / approve transcriptions, instead of trying to resolve multiple submissions
- Or, given multiple transcriptions, pick one which minimizes some edit distance.
- Or, use sequence alignment tools to find the best transcription of subregions in a larger string. (GitHub code does this.)
Locality: Again, a range, but probably want to try to clean up the transcribed string before going to geocoding service.
Names: Processing will depend on target database structure: Maybe you just want one string, or maybe you want to try to separate names. If the names are separated, they could be compared/linked to an outside list of collectors. (... and that could be part of a larger QA process: Does the collection date make sense, given the life span of the collector?) (GitHub code tries to do this.)
Back to Transcription_Hackathon