Georeferencing and visualizing biodiversity data for research

Contributed by Deborah Paul (iDigBio – FSU), Shelley James (iDigBio- UF)

October 2016 in the heart of Santa Barbara California, 30 eager collections personnel and researchers came together to learn tricks and tools for assigning geopoints to research biodiversity datasets, precise locality descriptions for new specimen collections, and software for visualizing and assessing the quality of biodiversity data. The 4-day short course entitled Georeferencing for Research Use (GRU) was hosted by The National Center for Ecological Analysis and Synthesis (NCEAS) and co-sponsored by iDigBio and the Cheadle Center for Biodiversity and Ecological Restoration (CCBER).

Our vision for this course started with the need to go beyond best practices for georeferencing legacy locality data from collections specimens as materials and human resources are already abundant for this. (See Train-the-Trainers materials at iDigBio, for example). We used an ecosystem point-of-view, referencing data producers and consumers, and bringing researchers, collection, and data managers together for this training workshop.

Over a year in the planning, the GRU planning team condensed the best practices for georeferencing legacy data into two days (thanks David Bloom and Jessica Utrup), leaving the remaining two days to venture into skills and knowledge needed for robust assessment of the geospatial data fitness-for-use. Using a real collections dataset from iDigBio, we investigated how to find and fix data quality issues, focusing on date (time), taxon name, and locality data. Many of the skills needed to do this are the same for both researchers and data managers. Some participants also brought their own datasets. The overall goal was to show researchers and managers how to confidently get to a dataset ready for a research method and downstream analysis, like Ecological Niche Modelling.

We worked with a Carabidae (beetle) dataset from iDigBio to show what researchers might be looking for in the georeferenced and related data before research methods are applied. The participants learned about each other’s needs. Issues inherent in transforming a text description of a location were illustrated, and participants also learnedhow to provide better, born-digital, standardized locality data from any future collecting events, avoiding the issues of legacy data.  The participants and instructors all learned new ways to visualize and clean their data using QGIS as well as some new tricks in spreadsheets and OpenRefine.

dataset description: all records we have for the family Carabidae (beetles) where the dwc:stateProvince='california' and a georeference exists for each record. Try an iDigBio API search for carabidae+california+geopoint to see the current result of this search. See the search results in the portal or download the actual dataset we used.

Through discussions, we learned a lot about the skills and knowledge needs of the broader community for facilitating more reproducible geospatial research. Our group is currently working on writing a paper to share our insights with the greater collections and research community. Other outcomes in the works include a webinar on issues with the creation and use of polygons on 13 December 2016 and webinars on the data quality flags in iDigBio (2017), and an online version of many of the skills captured in a Data Carpentry repository on GitHub including a geospatial lesson (in development). Also,recordings will be available, especially useful for those who took the course, to revisit the content.

Many kind thanks go to the planning team who worked very hard on this workshop. It was new territory for many.  We could have not done this workshop without the skills and dedication of GIS graduate student, Sara Lafia at the University of California Santa Barbara (USCB) in collaboration with Katja Seltmann, Director of the Cheadle Center for Biodiversity and Ecological Restoration (CCBER). NCEAS staffer Mark Schildauer and others provided mentorship and insights for Sara’s detailed QGIS lessons (Open Source Geographic Information System software). Michael Yost (MaCC TCN Georeferencing) gets kudos for a thoughtful lesson on creation of polygons for expressing uncertainty. Much gratitude also to Una Farrell (Research Lab Coordinator, Department of Geological Sciences, Stanford) for stepping in when we really needed her, David Bloom (VertNet Coordinator/iDigBio Data Mobilization Specialist), the mystery remote voice of Jessica Utrup (Yale Peabody Museum Technical Assistant), Nelson Rios (GEOLocate Developer, Tulane), and Shelley James (iDigBio Data Management Coordinator) for all making the pieces come together.

It should be noted too that NCEAS is a wonderful place to hold a workshop. The staff could not be more gracious and helpful, the setting very conducive to collaboration. Special thanks to NCEAS staffers Thomas Hetmank and Ginger Gillquist. After the workshop, some of us were lucky enough to visit Coal Oil Point and tour the CCBER collections at UCSB. And on that note, what is in the image in the bottom right of the photograph?