Research uses of online biodiversity data
Contributed by Joan Damerow, Postdoctoral Researcher, Field Museum of Natural History
Recent headlines warn of insect apocalypse and silent skies as studies reveal plummeting populations of insects and birds. In 2018, scientists discovered that certain insect populations have declined by as much as 80% since 1989. A new study in Science demonstrated a 29% decline in bird populations since 1970. Reporters outline human experiences that lend credence to these findings; many have expressed a general feeling of “missing something,” as they drive through the countryside without bug stains on the windshield. However, these shocking discoveries were only possible where sufficient long-term data exist to evaluate shifts in biomass and abundance. Species occurrence and abundance data provide the hard evidence that we may need to inspire positive change--especially high-quality data that are standardized, published, and openly accessible.
Long-term population datasets, collected using the same methods over time, are rare and usually isolated to small regions and specific organisms. However, online databases and aggregators with opportunistic species records have grown rapidly in recent years. For example, the Global Biodiversity Information Facility (GBIF) has ballooned from just over 200 million records in 2010, to over 1.3 billion today. And these records represent millions of species of plants, fungi, vertebrates, insects, spiders, gastropods, crustaceans, and more. This is thanks to efforts like iDigBio to digitize natural history collections, and the growth of citizen science platforms for recording species occurrences (e.g. eBird and iNaturalist). These online records contain priceless historical data, and we wanted to find out how scientists are using them.

Pinned butterfly specimens from the Field Museum of Natural History. Collection data from insect specimens at the Field Museum are also published online in GBIF (www.gbif.org). Credit: Field Museum, John Weinstein
In a recent paper (https://doi.org/10.1371/journal.pone.0215794), we explore how researchers have reused species occurrence data since 2010. We examined 501 papers that use openly accessible biodiversity data and assigned specific tags to each based on: online database(s) used, organisms addressed, research use of the data, other data types linked to species occurrence records, and data quality issues addressed.

Scientific specimens stored in alcohol in the Field Museum's collections and used in research. Credit: Field Museum, John Weinstein
Species occurrence data have fairly clear and common use cases in distribution modeling and biodiversity assessments. Indeed, we found that the most common uses of online biodiversity databases have been to estimate species distribution and richness, to outline data compilation and publication efforts, and to assist in developing species checklists or describing new species. We documented 31 different general research uses in our study. Papers with the highest mean number of citations per year involved more applied studies in disease ecology, public health, documenting extinctions, developing new methods to deal with species occurrence data, and citizen science.
Specific uses varied based on the type of organism involved. We found that studies using online species records were most common for plants. However, the average vertebrate species may generally be more suitable for distribution study because vertebrates are less diverse and many collections are completely digitized. The most active citizen scientist communities focus on bird observations (eBird), and data for individual vertebrate species are more likely to contain enough records for accurate distribution models. Conservation studies are also more common for vertebrates, likely because they are disproportionately represented in threat assessments. In contrast, highly diverse invertebrates are more likely to be the subject of foundational biodiversity studies, such as taxonomy, barcoding, and data papers.

An example of bird specimens kept behind the scenes at the Field Museum, with tags telling what species they are, where they were collected, and when. Credit: Field Museum
Current problems with biodiversity data reuse highlight areas for improvement and outreach. A high proportion of papers in our study did not sufficiently cite databases, and many databases were no longer accessible at the time of this work. Continued efforts in data preservation and promoting best practices in data citation are essential for advancing scientific reproducibility and encouraging publication of high-quality biodiversity data. Only 69% of papers in our dataset addressed one or more aspects of data quality, which is low considering common errors and biases known to exist in opportunistic datasets. Most of these update nomenclature using online sources and correct spatial errors; potential errors and biases that require expert input are less often addressed. Improving automated solutions to flag errors and efficient mechanisms to report and correct data quality issues will be essential for advancing the relevance and broadest use of species occurrence data.
Despite great progress, we are still in the initial stages of compiling biodiversity data globally, and across taxonomic groups. New approaches that integrate data types are restricted to certain plants and animals, and regions with higher numbers of quality records. We still have work to do in promoting digitization of biocollections data; this is particularly true for insects, which have the highest number of species and specimens. Continued data publication, enhancement, and quality control efforts are necessary to make biodiversity science more efficient and relevant in our fast-changing environment.