iDigBio and Data Carpentry go to Africa

Location: BIS (TDWG) 2015 Biodiversity Information Standards:
An amazing 2 weeks in Nairobi, Kenya.
by Deb Paul, input from Libby Ellwood and Matt Collins.

For the first time ever, The Biodiversity Information Standards (TDWG) Conference took place on the continent of Africa, in Nairobi, Kenya. Every 3 years, TDWG holds the annual meeting in a developing nation and uses the opportunity to provide biodiversity informatics training in the week before the conference. (

Biodiversity Information Standards (TDWG) Group bring us the standards (like Darwin Core) and applications we use to successfully share our biodiversity data

iDigBio staff members, Matt Collins, Libby Ellwood, Kevin Love, and Deb Paul (that’s me), first headed to the BIS (TDWG) Training Week held at Multi-Media University (MMU), bordering Nairobi National Park. On day one, we jumped in with Training Week Organizer, Henry (Hank) Bart, as assistants in the GEOLocate Training.

On Wednesday and Thursday, it was time for a Data Carpentry (DC) Workshop. Matt and I worked together on this as part of a strategic plan to also include a presentation at #TDWG15, along with the workshop, to explain the Data Carpentry data model. All of this was made possible by funding from the Gordon and Betty Moore Foundation for Data Carpentry, the JRS Foundation for participant support, and iDigBio with the goal of providing training and skills that empower our participants to provide better data for the future. At the same time, we hope that by explaining how to get involved in DC, the Africans can scale up their efforts to supply leading-edge biodiversity informatics data management and mobilization skills, to as many of their African colleagues as possible.

Someone asked, “why are you teaching biologists programming?” We’re not. At Data Carpentry, we’re teaching basic scripting, and introductory skills that empower biologists to manipulate their data in a manner that supports reproducible research.

Twenty-four participants from 12 African countries including: Zimbabwe, Benin, Ghana, Nigeria, Rwanda, Ethiopia, South Africa, Madagascar, Kenya, Guinea, Cameroon, and the Democratic Republic of the Congo joined us for the DC experience. A rich array of scientific disciplines and roles represented in the group included ecology, data management, environmental information systems, forestry (silviculture, forest ecology and management), waterbird ecology and biodiversity informatics, biology - genetics, plant systematics and taxonomy, herpetology, earth science, conservation ecology, and microbiology.

In addition to the many new Homo sapiens sapiens (human) friends we made during the DC workshop, we also got to know Cercopithecus albogularis (Sykes' monkey), Cercopithecus aethiops (Grivet monkey), Papio anubis (Olive Baboon), Colobus guereza (Eastern Black & White Colobus), birds like the Leptoptilos crumenifer (Stork), the imposing and impressive Sagittarius serpentarius (Secretary bird), the Struthio camelus (aka Ostrich), the Threskiornis aethiopicus (Sacred Ibis), the wonderfully unique Upupa epops (hoopoe, that’s “hupu”), Bostrychia hagedash (Hadada Ibis), Colius striatus (Speckled Mousebird), Plocepasser mahali (White-browed Sparrow Weaver), Lamprotornis superbus (Superb Starling), Corvus albus (Pied crow) and the unforgettable Phacochoerus africanus, aka wart hogs – just to name a few. Of course, we also saw giraffe, elephants, lions, and many different kinds of antelope. Yes, entomologists and ichthyologists, we did see insects and fish too!

Our two day DC workshop designed for the novice covered: data organization in spreadsheets and introduction to Open Refine, introduction to R, and data analysis and visualization in R. Originally, we planned to present an introduction to SQL, but for several reasons beyond our control, we were not able to get to this topic. We hope to figure out how to provide this training in the future, perhaps online. All materials for the workshop can be found on our GitHub pages at: A Data Carpentry Workshop at Multimedia University of Kenya.

To keep track of our Data Carpentry efforts, and set goals for the future, we do pre- and post-workshop assesment surveys. 88% of our participants said they gained a “great deal” from participating in this DC event. When asked about their data management skills following the workshop, 77% said they had higher or much higher skills. 100 % of the 17 survey respondents agreed or strongly agreed they could immediately apply what was learned at this workshop. Looking to the future, many of the participants indicated they wish to bring DC to their stakeholders across Africa.

Post workshop, participants shared they were immediately able to provide better data to collections data aggregators like iDigBio, GBIF, and VertNet.

While we were sad to see the biodiversity informatics training week end, it was time for BIS (TDWG) 2015. And, thanks to JRS Foundation funding, all the JRS-funded training week participants also joined us and participated in BIS (TDWG) 2015. This support greatly enhanced BIS (TDWG) and added to the opportunities to find collaborative research ideas and develop collegial relationships and increase involvement in BIS (TDWG) in the future. You can read more about our Data Carpentry experience through the blog posted at Data Carpentry from iDigBio colleague, Matt Collins.

One of BIS (TDWG)’s Interest / Task Groups is working on refining what we know about capacity-building needs and collaborating on this topic. Dmitris Koureas is part of this group and put together a compelling symposium on this topic:

Professional and formal biodiversity informatics training as a key for capacity building: state-of-play, challenges

Dmitris’ talk captured the essence of this challenge and provided some very compelling data showing the need for training if we are to be successful in our current efforts to more effectively mobilize collections’ data – and sustain this initiative going forward. Listen for yourself to Introducing data training programmes in large natural history museums to tackle their digital challenges (mp4). It’s not enough to purchase computers and provide collections management software and cameras. Our researchers, collections and data management staff need skills and knowledge that facilitate giving data a life beyond the drawer, jar, and cabinet. From iDigBio, I got to contribute to this session by sharing what we’re learning about capacity-building from the iDigBio and Data Carpentry Workshhop experiences with a talk More data than we know what to do with? Biodiversity informatics skills needs in the research data pipeline (co-authors: Deborah Paul, François Michonneau, Katja Seltmann). One of the most challenging issues for this community need is how to scale up training to meet the demand. It’s clear that quite a few projects across the planet are trying their best to teach these skills to their stakeholders – and are also trying to figure out how to reach more individuals.

Libby and I also organized a session at TDWG focused on Biodiversity Data Mobilization Models ( In this session, Libby highlighted the ways citizen scientists can contribute to data mobilization through online transcription tools, collaborative georeferencing projects, and annotation possibilities. Mary Barkworth followed with a presentation of the research and educational resources of Open Herbarium ( Matt Collins rounded out the first half of the session with a presentation about the DC Model with a short demonstration of a lesson in R, Data Carpentry style. (Great to have colleagues traveling together. At this point in the conference – Deb was just a bit under-the-weather). Matt stepped up and gave this presentation. Thanks Matt! Nicky Nicolson described a practical application within the Open Refine framework. Jean Ganglo rounded out the session with an example of mobilizing plant specimen data in Benin. Together, these talks provided a diversity of presentations from several countries that described the utilization of numerous tools and platforms.

If you would like to view recordings of these talks, or for that matter, any from TDWG 2015, you can do so here:

Matt also presented a poster co-authored with Alex Thompson and Jorrit Poelen on using Spark, a big data computing framework, with iDigBio data: Whole-dataset analyses using Apache Spark. A blog post providing background on the poster is currently on the iDigBio web site. And Libby Ellwood presented Mapping Life: Quality Assessment of Novice and Computer Automated vs. Expert Georeferences with co-authors Henry Bart, Michael Doosey, Dean Jue, Gil Nelson, Nelson Rios, and Austin Mast. Listen to Libby’s presentation to find out what the research tells us about the difference between the georeferences of experts compared to beginners with just a bit of training.

We’re looking forward already to BIS (TDWG) 2016 in Costa Rica. We hope to see many of the African participants there. How about a biodiversity informatics training week before every BIS meeting? What about having SPNHC and BIS meet together every few years? Our missions are mutually beneficial – and meeting together would foster and simplify collaboration and application development. From a digitization and data use perspective, looking forward to #BIS2016 and your part in it.

Thanks for reading,

Deb Paul, Matt Collins, Libby Ellwood, and Kevin Love, et al at iDigBio