Phenology Deep Learning Workshop Report

The workshop started on the morning of Thursday, January 17th at the University of Florida Biodiversity Institute in Gainesville, Florida, with Pamela Soltis and Gil Nelson introducing the themes and intentions of the workshop. Gathering together iDigBio members and those from the phenological community internationally and around the country, the workshop’s intention was to explore such technologies as deep convolutional neural networks (CNN), data protocols and standards for linking trait data to specimen records, tracking phenological synchronization across phyla, and tools for trait measurement and analysis.

On Thursday morning, several workshop participants presented their previous work and research, showcasing the breadth of the field and the expertise of the participants. First, Alexis Joly (INRIA), spoke about work that Pl@ntNet has completed with plant identification using deep learning, as well as current and future research directions. Then, Brian Stucky (UF) spoke about the Machine Learning phenology work that he and his collaborators have completed with Acer and Prunus specimens, and lessons learned. Alex White (Smithsonian) spoke about various ways that they have used deep learning models with herbarium specimens, such as to detect mercury staining, and to differentiate between spike moss and club moss. Nicky Nicolson (Kew) focused on collector data mining, in an effort to better explain biases, and also covered ways to consistently annotate images. Susan Mazer (CalPhenology project) walked us through workflows that trained undergraduates to identify reproductive structures and count them, and the research and questions that came from that work. Katie Pearson (Cal Phenology TCN)  spoke about her project’s plans to store phenological data, work with Darwin Core to further develop phenological standards, and use the Plant Phenology Ontology. Additionally, her presentation explored important, future-thinking questions and issues about metadata and annotations. Patrick Sweeney (NEVP) spoke about the process of scoring phenological data from human volunteers, and also gave examples of current limitations of the field (eg. some taxa are not scorable by humans from images, such as grasses, sedges, small flowers). Finally, Renato J. Figueiredo  (iDigBio) spoke about iDigBio’s forays into developing a deep learning processing pipeline, highlighting in particular the role of GUODA,, and the need to make it more compatible with other infrastructures. The full presentations can be found at under “Presentations.”

 After a coffee break, we regrouped to settle on the major themes and topics that would guide our work for the remainder of the workshop, including big questions, bottlenecks, and workflows. We tackled broad themes, such as the ultimate goals for using deep learning techniques in phenological research, as well as adressed more specific questions such as standards for image size and confidence scoring. We also had remote participants join in the conversation, including Charles Davis, from Harvard University, and Katja Seltmann (UCSB).

On Thursday afternoon, participants divided into small groups to do an activity with the prompt “Write the introductory thesis paragraph of a paper that outlines the perfect system to satisfy all needs for phenological scoring.” After some time, we reassembled to compare and contrast the components of our “perfect system.” We noticed remarkable consistency across the groups, with many groups recognizing the need for a versatile, collaborative, and standards-based approach. The activity helped us target a sense of where the field is now, as well as the future direction of phenological research. Building on this activity, we focused on generating the themes for break out topics the following day: the “burning issues” of the workshop. On Thursday evening, many participants met for dinner in Gainesville and continued the conversations over their meals.

On Friday, break-out sessions continued with group discussions on the main themes and “burning issues” that we had identified the previous day. Participants self-selected their groups based on their interests or what they had to offer particular groups. These topics ranged from standards and scoring to machine learning and cyberinfrastructure, and also focused on ways to extend the lessons learned from phenology to other fields.

The content from these break-out groups became the basis for a paper on machine learning in phenology research, which we worked on Friday afternoon. We worked as a group to take the content we generated and to outline the key sections of the paper. We then broke into groups to address the sections within the paper that we felt most comfortable to tackle. The paper is in progress, as work continues after the session, headed by Katie Pearson.

A list of the participants is below:

Susan Mazer, University of California, Santa Barbara

Brian Stucky, Florida Museum of Natural History, University of Florida

Katelin Pearson, California Polytechnic University, San Luis Obispo

Alex White, Department of Botany / Data Science Lab, Smithsonian Institution

Sylvia Orli, Smithsonian

Patrick Sweeney, Yale Peabody Museum of Natural History

Emily Meineke, Harvard University Herbaria

Charles Davis, Harvard University Herbaria

Ellen Denny, USA-NPN/Univ of AZ

Laura Brenskelle, University of Florida

Nadya Williams, University of California, San Diego

Libby Ellwood, Natural History Museum of Los Angeles

Pierre Bonnet, CIRAD

Alexis Joly, Inria

Hervé Goëau, CIRAD

Titouan Lorieul, Inria / University of Montpellier

Nicky Nicolson, Royal Botanic Gardens, Kew and Dept of Computer Science, Brunel University, London

Katja Seltmann, UC Santa Barbara

Isaac Park, University of California Santa Barbara

Mason Heberling, Carnegie Museum of Natural History

Myla Aronson, Rutgers University

Renato Figueiredo, University of Florida

Annika Smith, Florida Museum of Natural History

Pam Soltis, Florida Museum of Natural History

Gil Nelson, Florida Museum of Natural History