More data than we know what to do with? Biodiversity informatics skills needs in the research data pipeline

Authors: Deborah Paul, François Michonneau, Katja Seltmann

Abstract: Scientists need ever-faster access to more and richer data. Projects like iDigBio, GBIF, VertNet, and others are addressing this need. A search in one of these data sources may yield millions of records. Researchers are discovering they need new skills in their scientific workflows to work with such large data sets. At various workshops about using biodiversity data in research, we hear:

I’ve borrowed my colleague’s computer.
I’m running analysis on three different computers.
Excel is a database, isn’t it?
Should I learn R? or Python? Is it worth my time?
How do I visualize this data?
What’s the best way to share data with colleagues?
How do I work with a txt file?
How do I use APIs to enhance my research, …

Bottle necks. If we only address skills development and enhancement for the museum collection and data managers, data coming into collections will continue to have the same data issues (no or few globally unique identifiers, non-standard dates and names, encoding issues, leading / trailing white spaces, non-standard headers, etc). And if we do not address the researcher’s data literacy and skill set, we’ll continue to have issues with resulting research data sets that are difficult to discover, and difficult or impossible to re-use.

Researchers and data mobilizers need computational and data literacy skills for: data standards, data management in the data lifecycle, data visualization, and data archiving. For the last 5 years, we’ve been learning a lot at iDigBio about these needs in our community through our workshops and collaboration with such groups as Data Carpentry ( and DataONE ( This talk provides an overview of what we've learned from our workshops, what materials we've developed along the way, and what's in store for the future.

Where: Biodiversity Information Standards (TDWG) Conference 2015

In TDWG Symposium S02 - Formal biodiversity informatics training as key for capacity building: state-of-play, challenges and opportunities

Building: Windsor Hotel
Room: Oak Room
Date: Tuesday, 2015-09-29 12:00 PM – 12:15 PM (Nairobi time)

Start Date: 
Tuesday, September 29, 2015 - 5:00am to 5:15am EDT
BIS (TDWG) 2015, Nairobi, Kenya