Opening up Behavioral Data

Thu, 12/08/2016 - 3:54pm -- gnelson


Mike Webster (Cornell Lab of Ornithology) and Gil Nelson (iDigBio)

The advent of open data has led to increased availability of digital information across all domains of the biodiversity sciences. While many of these data ensure the availability of scientifically relevant occurrence records of vouchered museum specimens and field-based observations, most are textual in nature, relatively simple in structure, and require only a modicum of storage capacity. Even where associated media are included, the media involved usually consist of 2D images or media with relatively modest storage requirements. The iDigBio portal, for example, currently contains about 75 million collection object records that are largely text-based, with less than 20 million media records, mostly 2D images.

More recently, animal behavior researchers and behavioral ecologists have begun exploring methods for publicly storing and distributing more complex data types that capture the extended behavioral phenotype of the individual organism. These might include recordings of vocalizations and communication signals, videos of typical behaviors (e.g., courtship displays, antipredator behaviors, or parental care), tracking data of individual movement patterns, or high-speed recordings of individual limb movements. In many cases, these types of data consist of literally millions of hours of raw audio and video recordings, often augmented by coded and processed interpretations extracted from them. The sheer size of these datasets makes it challenging if not impossible to adhere to the prevailing size restrictions imposed by most publication on supplementary materials, or even the restrictions of dataset archivers or repositories.

In late October 2016, thirty-six experts from various disciplines convened under the auspices of the Cornell Lab of Ornithology’s Macaulay Library for the purpose of exploring solutions to make the vast stores of existing and potentially new animal behavior data available, accessible, and searchable. Funded by the NSF and hosted at the Greek Peak Resort near Ithaca, NY, the workshop attracted a wide array of researchers, professional journals, and data publishers, including leaders of the Animal Communication Signals TCN headquartered at the Macaulay Library. iDigBio was pleased to be invited and take part in the discussion, and to contribute to the pool of ideas.

The workshop provided for rich discussion and ample opportunity to record conclusions and highlights. There were virtually no presentations, the 1—2-hours­ sessions instead being devoted to small group discussions designed to catalog challenges and work out potential solutions, beginning with the provocative question “Do we want to do this?”. Discussions often ran long and bled over into lunches, breaks, and the evening reception. Participant groups filled more than 20 sheets of newsprint over the two days, which are now being distilled into a white paper to which participants can add ideas remotely.

The workshop ended with commitments to continue this effort. Next steps include fleshing out a “roadmap” for behavioral researchers interested in digitizing/archiving their data, publishing guidelines for best practices to do so, and creating a white paper with recommendations for the broader behavioral research community. Longer-term goals would include building tools and workflows that facilitate the ability of researchers to work with their data while simultaneously making it more broadly available.