TDWG 2016: Highlights for biodiversity research
The Biodiversity Information Standards (TDWG) annual meeting in 2016 had the theme of "Standards Supporting Innovation in Biodiversity and Conservation". Understanding the use of biodiversity standards, and having clear and concise documentation, is essential for the creation, aggregation and downstream use of biodiversity data, and it is exciting to see the diverse TDWG community helping to clarify and expand on the already existing data standards. Taxonomy, taxonomic resolution services and identifiers, and efforts to embrace concept reasoning were discussed widely in sessions (Go Nico Franz!) throughout the meeting. Data standards needs for paleobiology also resonated in many sessions. Textual data mining and trait analysis were hot topics as increasing amounts of biodiversity data becomes mobilized, and folks are still passionate about ontologies. While not all topics outlined in the program might be your cup of tea, as a biodiversity researcher or biocollections professional it is worth reviewing the wealth of activities taking place within the TDWG community, to both be informed and become more involved. Here are some valuable research highlights from the meeting.
Google Open Refine Workshop. Dimitri Bronsens & Peter Desmit lead an excellent workshop on "How to standardize a dataset to Darwin Core with OpenRefine" which you can follow along with through their online tutorial. From the basics of importing a dataset into this free and open source software and creating a project, to faceting data and using clustering algorithms for identifying and correcting errors and inaccuracies in the data, to calling reconciliation services for taxonomic name matching or reverse georeferencing, and finally how to reuse operation history. Highly recommended by all who attended - and even non-novices in the room learned something new!
Annosys. Did you know GBIF has implemented the Annosys open-source system for annotating, correcting and enriching taxonomic data associated with specimen occurrence records? Have you found a specimen record that could use some tweaking? On the GBIF specimen record page, a blue "Annotate" button to the right of the page; sign in and help with specimen curation. Take a look at Walter Berendsohn's talk for more information and information as to how this annotation system could be incorporated into your biodiversity data portal.
Advances in fitness-for-use assessment of biodiversity data. The TDWG Biodiversity Data Quality Interest Group is involved in determining and resolving issues pertaining to the development of a data quality-fitness for use standard and vocabulary/terminology framework (Task Group 1 - TG1), assessment of tools already available and used by the community and particularly data aggregators such as the data flags provided by iDigBio (TG2), and the identification of use cases needed to community needs and assess fitness-for-use tools and judgments on a single record or entire dataset (TG3). Much work by this community group comprising of data aggregators including iDigBio, researchers and collections data managers, has been completed to date, including a draft framework for fitness for use based on Darwin Core as a starting point, a comprehensive assessment of the data quality tests and assertions already implemented by data aggregators, and the beginnings of a library of fitness-for-use case studies/stories to be used for fitness-for-use profile and standard methodology development. Several sessions (see Sessions S06A and S06B) provide informative coverage of progress made to date.
The Interest Group wishes to capture data user stories in a practical, plain language in order to better understand the specific needs of biodiversity data users. Past and present, successful and failed biodiversity data use attempts are equally welcome. The ultimate goal is to improve the research and collection community's data use experiences and to do this, a public collection of data use examples - or stories - needs to be gathered that would be helpful for data providers, data aggregators, and for you, the user, to demonstrate the spectrum of data relevance, purpose, and analyses of primary biodiversity data.
Do you have a biodiversity data use story to tell? Help with the development of data quality and fitness-for-use metrics by submitting your User Stories
Big data analysis - methods and techniques as applied to biocollection was an informative session hosted by iDigBio. Take a look at the blog produced by the iDigBio team, and watch the recordings from the session.
Online publishing is rapidly evolving, and the Pensoft team provided an informative workshop highlighting new approaches and tools in Open Data publishing for biodiversity as well as demonstrating nanopublications for biodiversity which may help to leap forward rapid documentation of new species, distribution outliers or phenological discrepencies.
Data visualization is increasingly important for discovery, exploration and informative publication as biodiversity datasets increase exponentionally in size, and was highlighted throughout the meeting and in a dedicated session. Arturo Ariño gave a marvellous analysis of how biodiversity and the data standards community has evolved since the inception of TDWG in 1985, and the Cornell Lab and eBird provided some excellent demonstrations of the dynamic visualizations of bird migrations they are now able to generate. Linking back to conservation, gaps in data, developing inventories, checklists and catalogs all have practical, on the ground implications for understanding and protecting the world's biodiversity.
Citizen Science. As highlighted in Arturo Ariño’s analysis of the TDWG community, the role of citizen science is growing in our research. This was reflected in two citizen science symposia, Citizen Science for Biodiversity Research and Interoperability and data provenance for online crowdsourcing of biodiversity specimens, an iDIgBio presentation: “Worldwide Engagement for Digitizing Biocollections (WeDigBio)—Our Biocollections Community’s Citizen Science Space on the Calendar”, a discussion group “Defining Infrastructure Needs & Standards to Increase Global Monitoring Through Citizen Science”, other talks dispersed in various symposia, and numerous posters. These presentations spanned a range of topics from theoretical and philosophical, to methodological and analytical. Perhaps one of the strongest signs that citizen science is gaining ground in TDWG is the reinvigoration of the Citizen Science Interest Group. Rob Stevenson (UMass-Boston) and Libby Ellwood (iDigBio) are co-convening this group and reestablishing it as an active part of TDWG. Stay tuned for details and information about joining this interest group!