The Momentum for Integrating Collections and Ecological Research: Expanding Collections Stakeholders and Imagining Future Data Needs

by Deborah Paul, Libby Ellwood, Christina Alba, Larry Page
with contributions from our speakers: Dave Tazik, Jennifer McGuire, Anna Monfils, Barry Sinervo, and Elizabeth Martin; and from some participants present at this symposium including (at least): Vince Smith, Mary Klein, Herrick Brown, and Jason Knouft

‘What do we need to leave behind today to position our future selves for success?’

The Integrating Collections and Ecological Research (ICER) working group began after last year’s ADBC Summit 2016. Some of our goals are to look at how the collections community can better meet the data needs of ecologists, how we can disseminate relevant research-ready data to ecologists, and to gather and share current and compelling uses of collections data serving ecological research and conservation needs.

At this year’s ADBC Summit 2017, the ICER group assembled a half-day symposium bringing together five speakers to connect the visions of the collections, ecology, and conservation communities: Integrating Collections, Ecological Research and Conservation. We invited speakers to share how they use collections data in their ecological research, and to include some of the challenges in using this information. Specifically, we asked our presenters:

How to reach more ecologists
Which ecological groups specifically are well-suited to using collections data, in your experience?
What data are missing from collections data that, if collected in the future, would make collections data better for ecologists?
Your ideas for more potential stakeholders / consumers of collections data
Software / hardware / skills / knowledge needed to facilitate this enhanced collaboration between collections and ecology.

Background.

Anticipating current and future data needs of researchers requires us to gather and synthesize diverse input and insights. Today’s ecologists evaluating current practices suggest changes in standards of practice for both researchers and scientific collections (Morrison et al. 2017; Ward et al. 2015; Schilthuizen et al. 2015, Pyke et al. 2009). Morrison et al. (2017) present several questions to frame planning for future needs, including:

What information regarding past species distributions, environmental conditions, and human- environmental interactions should we collect and analyze to inform today’s management of natural and cultural resources?
What kinds of data, specimens, and other objects should we gather in the years ahead to establish a baseline for future comparison?
How could we systematize and streamline the collection of [high] priority data streams?
What kinds of partnerships are needed to collect, receive, and curate specimens and samples from the place and resources of interest?
Can we build a constituency for further investment by publicizing examples of how archived materials have been valuable in resource management decisions?
‘What do we need to leave behind today to position our future selves for success?’ An interdisciplinary workshop may be a productive way of elucidating priorities.

From the Networked Integrated Biocollections Alliance (NIBA) Implementation Plan (NIBA 2013), the collections and research sectors outline six goals with commensurate objectives. Some of these relate directly or synergistically to the future as envisioned by ecologists. For example, NIBA (goal 6) asks for collections specimen data to be integrated into K-20 formal and informal education to support learning about such topics as climate change and molecular ecology. Those writing the NIBA plan included a call for a “broader spectrum of stakeholders supporting the NIBA vision for the use of digitized specimen data in research and expanded nonfederal collaborations with international, regional, state, and local agencies with an interest in species occurrence data.” Looking forward, NIBA outlined the need to:

“Advance engineering of the US biodiversity collections cyberinfrastructure.” It makes sense, to successfully expand those using collections data, and at the same time, continue to improve future data.
Implement adaptive technology strategies around core discipline standards to enable efficient digitization workflows, effective data management, permanent data archives, innovative and synthetic research, effective biodiversity policy, and ubiquitous educational engagement.
Organize existing examples of leading-edge integration, identify likely new areas leading to research insight.

Our speakers.

Dave Tazik shared how the NEON community navigates the collections and ecology intersection. To better facilitate collaboration and constructive outcomes, they reconfigured NEON Task Groups so that all groups have representation of both ecologists and collections staff. And of course, NEON vouchers specimens along with recording data associated with the sampling event and remote sensing data.

Anna Monfils (Central Michigan University) brought us a fantastic conservation use case with the Prairie Fen Biodiversity Project (PFBP). In this effort, online digitized data provide “unprecedented access to biodiversity data and facilitates data accessibility, current data updates, and a broader use of the specimen and research data both within our research team and with associated conservation and management partners.”

Collections data can be used to predict phenology, demography, and also extinction risk. In an exciting talk by Barry Sinervo (University of California Santa Cruz EEB), we learned how he uses specimen occurrence records with published ecology data to develop species distribution models that are highly predictive of extinction risk.

Jennifer McGuire (Georgia Tech) took us eight stories down a natural pit-fall trap (Natural Trap Cave, Wyoming) that provides fossil evidence of organisms from 20,000 years ago. Remains in this cave deliver a perfect resource for paleoecology research investigating spatial questions about ecology and evolution and the impacts of climate change. The remains found here provide a rich data source for inquiring about intraspecific specimen-level variation to species distributional changes in relation to climate change over geologic time.

We also need to know how researchers use, synthesize, and re-use scientific data. Understanding scientists’ perspectives and insights help us better meet their data needs for future research. We invited University of Florida (UF) graduate student, Elizabeth Martin, to share preliminary findings from her survey of scientists and professionals about reuse of species occurrence data and use of web-based information systems.

Our speakers gave us some clear feedback about data needs.

Dave Tazik compiled a list for us of recent data requests at NEON to show us the wide variety of research being done with this information (see screen shot right). Having use cases like these help us understand and provide appropriate resources for those using, and integrating heterogenous data.

Jenny McGuire notes that some of the data needed to improve models are often difficult to access. This includes data about: physiological limitations, interspecific interactions, and habitats and landscapes. Museum data could fill some of these gaps by providing: species co-occurrence notes, standardized habitat data, location-specific physiography, and absence data. [Note from Deb here on a model for collecting and sharing absence data. This is a shout-out about Andrew Short, entomologist at the University of Kansas. Andy routinely records absence data when out collecting aquatic beetles – and he has a database designed to store and share the presence and absence data. See CReAC to learn more]. Jenny also pointed out the need to include local covariate occurences and data, e.g., weather or habitat type, for building models that can answer questions about interactions and that integrating collections data into mammal range prediction models using MaxENT may improve the models.

To get better data in the future, Jenny also notes that the collections community should strive to provide not only the voucher specimens and tissue samples, but guidelines about what covariate data to include. She suggests images of the specimens and locality to allow inference of covariate data. Lastly, Jenny noted the value of access to aggregated data and images online making it possible for researchers at institutions without collections to better identify what they find and learn more about community change through time.

Anna Monfils and graduate students Rachel A. Hackett and Clint D. Pogue needed to integrate data (ecological, collections, etc.) across members of the research and management groups. She notes that to do this, they had to spend quite a bit of time to resolve vocabulary and deliverables issues between researchers and land managers. Together, they developed and created the PFBP using Symbiota. Anna noted several other challenges when working to use conservation data in her research (see screen shot).

Barry notes museum records are being protected by investment of programs like the ADBC. However, field stations’ specimens and data records are in danger of being lost. The data used in extinction modeling provides critical information needed to show where to collect now, before some of these species are gone. It took two weeks to merge museum data with molecular phylogeny data and there are data standards issues. It took 1.5 months to update the taxonomy. To expedite this process, Barry hopes to have a platform on the web that anyone can use. He also asks how we can incorporate physiology data (including temperature), and get taxonomy updated faster.

Academics often learn about data resources on the web, whereas non-academics discover these resources by word-of-mouth from their colleagues. This tidbit from Elizabeth Martin and Mary Klein (Nature Serve), tells us something about how we need to think about advertising our data sources. Interestingly, the majority of academics prefer raw data. This poses the question of how much time we should spend cleaning it – and if we continue to clean it, who is we (collections? aggregators)?

Insights and outcomes from the post-talk discussion session.

Several themes emerged from the conversations following the talks.

Overall, collecting new data and specimens is less of an issue compared to the historical backlog. While we do need recommendations for future data collecting, new collecting is under ever-increasing regulation. We need to know how to make the most of what we have already – to link it to environmental data.

One, ecologists need data to supplement occurrence information. In addition to climate data, researchers need location-specific physiography, interaction and association information such as plant-pollinator data, co-occurrence data, standardized habitat data, landscape and land-use data, and absence data. Participants commented that at least some of these data, like associated species, are already present in our occurrence records but is in free text fields making it difficult (if not impossible) to find and use. Changes will be needed at all levels (e.g., software, human behavior and knowledge, algorithms, ontologies) to address these challenges. Scientists using specimen occurrence data also need images of specimens in their habitat before collection to capture habit, habitat, landscape, associated species, morphology, phenology, and seasonal information. Images also provide ways for scientists to verify specimens and validate some morphological features on new finds - especially when they do not have collections at their own institutions (see screen shot on right, Jenny McGuire's talk).

Two, researchers using occurrence data apply their expertise in order to evaluate and improve the occurrence data records; this knowledge often does not get back to the original data provider. For example, a scientist reviews and updates the taxonomic identifications before using the data in their research – but these annotations never get back into the collections database. This is a known issue, with many hurdles, but it would help to collate the sticking points to see what is needed to address each one. The situation with georeferencing is analogous. Ecologists evaluating existing geopoints and georeferencing other occurrences often find it very difficult (if not impossible) to provide this information to the original data provider.

Three, researchers, land-use managers, etc., don’t know the data exist. Tackling the above issues will not have much impact if the data are an untapped, unknown resource. We learned that researchers most often use word-of-mouth to find data. This means we must engage and collaborate with communications experts to figure out how to reach this audience of would-be users of collections data. We will need built-in assessments to see if implemented strategies implemented are succeeding. The available data formats and structure will influence use / re-use as some communities may want more processed, more linked-data datasets that require less manipulation before use. And, more integrated decision support tools like PFBP resource will be needed to make it easier to use collections data.

Four, we need to look outward and engage new communities. Climate scientists want specimen data to add to their climate layers. Faculty members could create courses that incorporate these data into biology and environmental studies to reach more future end users, as championed by the new BLUE (Biodiversity Literacy in Undergraduate Education) RCN. And, we need to think beyond biology students. We need to engage with other students and faculty in such departments as public policy, public health, and geographic information systems (GIS).

As Morrison et al. (2017) suggest: “An interdisciplinary workshop may be a productive way of elucidating priorities.” One example is this half-day workshop – where we’ve already begun to learn more about what data we need to be collecting, recording and linking, and what channels we must use to reach scientists, policy makers, industry, and the land management sector. We need to use human social network opportunities – we will need the help of social scientists to successfully disseminate knowledge of collections data and its potential. As one of our outcomes, ICER is working to produce a 5-year plan to document and outline how to reach these goals, including:

the need to have more conversations like this one
to more deeply understand the wants and needs of researchers, managers, educators and other collections data users
demonstrate the utility and benefits of natural history collections data in a variety of applied, academic, and educational fields
to share ideas through in-person dialogues, cross-pollination of communities at conferences, and publications in peer-reviewed literature
working with existing tools and resources to meet the needs of data users

Also, ICER will be meeting online regularly, publishing some commentary about our observations and deductions so far, and looking into designing a workshop series to address the needs and goals derived from our conversations.

Recordings.

Integrating Collections, Ecological Research and Conservation - Half-day Symposium (Part1)

http://idigbio.adobeconnect.com/pd8jpb2vqcau/
David Tazik (NEON) (recording time mark 1:55-17:40)
Jenny McGuire (Georgia Tech) (recording time mark 19:54-35.22)

Integrating Collections, Ecological Research and Conservation - Half-day Symposium (Part2)

http://idigbio.adobeconnect.com/pys67j468xhl/
Anna Monfils (CMU) (recording time mark 00:44-18:23)* no sound - slides only
Barry Sinervo (UCSC) (recording time mark 19:23-35:43)
Elizabeth Martin (UF) (recording time mark 36:15-50:15)

‘What do we need to leave behind today to position our future selves for success?’

REFERENCES.

Alba C, Levy R. 2017. Natural History Collections as Primary Data in Ecological Research. Denver Botanic Garden Research Staff Blog post @iDigBio https://www.idigbio.org/content/natural-history-collections-primary-data-ecological-research

American Institute of Biological Sciences. 2013. Implementation Plan for the Network Integrated Biocollections Alliance (NIBA). Reston, VA, USA.

Morrison A, Sillett TS, Funk WC, Ghalambor CK, Rick TC. 2017. Equipping the 22nd-Century Historical Ecologist Trends in Ecology & Evolution , Volume 32 , Issue 8 , 578 – 588 DOI: http://dx.doi.org/10.1016/j.tree.2017.05.006

Pyke GH, Ehrlich PR. 2010. Biological collections and ecological/environmental research: a review, some observations and a look to the future. Biological Reviews 85, 247-266. DOI: http://dx.doi.org/10.1111/j.1469-185X.2009.00098.x

Schilthuizen M, Variappan CS, Slade EM, Mann DJ, Miller JA. 2015. Specimens as primary data: museums and ‘open science.’ Trends in Ecology & Evolution, Volume 30, Issue 5 DOI: http://dx.doi.org/10.1016/j.tree.2015.03.002

Ward DF, Leschen RAB, Buckley TR. 2015. More from ecologists to support natural history museums. Trends in Ecology & Evolution, Volume 30, Issue 7 DOI: http://dx.doi.org/10.1016/j.tree.2015.04.015

Clip Art CC0.

The Momentum for Integrating Collections and Ecological Research: Expanding Collections Stakeholders and Imagining Future Data Needs

Researchers

Collections Staff

Teachers & Students

Language