Data Without Borders ICE 2016

From iDigBio
Revision as of 16:27, 3 October 2016 by Kevinlove (Talk | contribs)

Jump to: navigation, search
Digitizing the Past and Present for the Future
IDigBio Logo RGB.png

Quick Links for Data Without Borders ICE 2016, Monday 26 Sept, 130-530 EST

ICE 2016 Symposium Abstract

Summary Statement: Many new and updated methods for collecting biological specimens now result in faster access for everyone to richer, more robust data for research. Scientists are learning new skills for collecting and managing field and lab data using relevant data standards, and publishing enhanced data sets as a result. Best practices for describing data sets with metadata are leading to improved data discovery. Researchers now have access to ever larger data sets for visualization, analysis, and modeling. In our symposium, we present a broad array of examples of the latest developments in biodiversity research using biological specimen data, including genomics, habitat, and trait data. We present current trends in collecting and vouchering of specimens and field data, methods and tools for digitizing the specimen data, and tools and skills needed for visualizing the data. We then highlight how the data are being used, especially for research that expands our understanding of biodiversity. Our Data without Borders session naturally fits the Entomology without Borders theme, addressing the world-wide need for fit-for-research-use data. An underlying theme for Data without Borders is International Collaboration for Biodiversity. In the last ten years, many changes such as powerful hand-held devices, apps, and computing in-the-cloud have made it possible to collect, use, and share data more easily, and in ways that support re-use. Collaboration makes it possible not only to document biodiversity more quickly but also to provide better tools and better data. We will provide examples of this type of collaboration in this symposium.

Date: Monday 26 Sept 2016, 130-530 EST

Short Description

Description: In Data without Borders, we feature talks about collecting museum specimens and digitizing the specimen data to support biodiversity research. Scientists show us how they are using biological specimen data in their research and we include presentations on career skills needed for 21st century digital collections and collaborative research.

time talk presenter / authors
130 Specimen Data in Integrated Biodiversity Research
Emerging cyberinfrastructure and new data sources provide unparalleled opportunities for mobilizing and integrating massive amounts of information from organismal biology, ecology, genetics, climatology, and other disciplines. Key among these data sources is the rapidly growing volume of digitized specimen records from natural history collections. With 60 million specimen records available online through iDigBio, these data provide excellent information on species distributions and changes in distributions over time. Particularly powerful is the integration of phylogenies with specimen data, enabling analyses of phylogenetic diversity in a spatio-temporal context, the evolution of niche space, and more. Such data-driven synthetic analyses may generate unexpected patterns, yielding new hypotheses for further study. However, a major challenge is the heterogeneous nature of complex data, and new methods are needed to link these divergent data types. Ongoing efforts to link and analyze diverse data are yielding new platforms for comparative analyses of biodiversity data. We present a case study that integrates phylogenies with heterogeneous data across spatial and temporal scales, and we explore patterns of phylogenetic diversity in Florida plants, with fundamental implications for the conservation and management of Florida’s ecosystems. This test case sets the stage for further integration of phylogenetic, distributional, temporal, and environmental data for discovering patterns of biodiversity and the processes that shape its distribution, assembly into communities, and interactions in ecosystems. Although many specific hypotheses may be addressed through integrated analyses of biodiversity and environmental data, perhaps the greatest value of such data-enabled science will lie in the unanticipated patterns that emerge.
Pamela Soltis (, University of Florida, Gainesville, FL
145 Like blood from that stone we always hear about: a quest to extract meaningful data from historical grasshopper specimens
Introduction: One of the most ancient ecosystems in the southeastern U.S.A. is scrub, often associated with ridge systems that were most likely used as refugia during Pleistocene sea level changes. Following sea level stabilization, these habitats effectively remained islands due to unique soil composition and a lack of plant diversity leading to a myriad of floral and faunal endemics. In particular, arthropod endemics abound as in the grasshopper genus Melanoplus (Orthoptera: Acrididae: Melanoplinae). Many genus members possess short wings incapable of flight and are unable to easily disperse over large distances, which makes such Melanoplus species ideal candidates for examining speciation hypotheses. To test such hypotheses, the Puer Group (PG), comprised of 24 species with related morphology, was chosen. The group spans four neighboring states (FL, GA, SC, and NC), contains many scrub endemics, and its males exhibit great genitalia variation. A good beginning for delving deeper into the group’s evolutionary history was determining current species ranges by georeferencing around 5,000 specimens, borrowed from various U.S. collections and gathered in the field during recent expeditions.
Methods: Via the creation of maps, detailed field notes, and different type of anatomical imaging, the backbone of this project is to collate as much data as possible for the PG.
Results/Conclusion: Then, it will be disseminated it to a wide audience for a trio of purposes: 1) raise awareness of a fascinating system of study, 2) create a solid platform for future studies to build upon, and 3) demonstrate the utility of integrating multiple methods of investigation.
Derek Woller ( and Hojun Song, Texas A & M University, College Station, TX
200 Acquisition, management, and analysis of historical and contemporary data to discern legacy effects of ecological extinction on insect biodiversity
Numerous economically important tree species are threatened with declines due to exotic pests or pathogens. Perhaps the best-known example is American chestnut, a culturally and ecologically important tree decimated by chestnut blight. The loss of American chestnut from the canopy in eastern deciduous forest had profound impacts on vertebrate food webs, but the effects of chestnut loss on insect biodiversity and trophic interactions remain largely unknown. The development of blight-resistant chestnut creates an unparalleled opportunity to study the effects of foundation species loss, and potential recovery, on insect food webs. Our objective is to generate and analyze historical and contemporary insect food webs for chestnut and oak focusing on herbivores and their natural enemies. We constructed a data matrix containing 1,049 records of individual insects associated with chestnut based on information and specimens from the Hopkins Notes and Records System housed at the Smithsonian Institution National Museum of Natural History. These records constitute 221 species, including herbivores and their natural enemies. In order to discern the extent of novel data derived from the Hopkins System, we surveyed the primary literature to construct a data matrix containing 259 records of individual insects representing 157 species associated with chestnut. Additionally, we placed three insect flight traps each in the canopies of American chestnut, Chinese chestnut, and red oak trees to discern the contemporary insect fauna sympatric with those tree species. This effort yielded 75, 72, and 75 samples, respectively, for the aforementioned trees with target insects exceeding 100,000 specimens. Lastly, we hand collected 279 lepidopteran larvae to discern host-natural enemy associations on the target tree species. This presentation focuses on three areas: (1) how multitrophic data derived from natural history collections, the literature, and contemporary sampling are gathered and managed; (2) questions data gathered from those sources might address using this research as an example; and (3) how products from this research are disseminated.
Robert Kula (, USDA - ARS, Washington, DC, John Lill - The George Washington University, Eugenio Nearns - Purdue University, and Harmony Dalgleish - College of William and Mary
215 Digitizing natural history collection specimens to investigate the future of species conservation
Natural history collections (NHC) are rich repositories that document our planet's ecosystems, both past and present. Within the past decade there has been a surge to revisit NHCs to digitize specimens. Digitized NHCs can provide a wealth of insight on the ecology, abundance, and distribution of rare and common bees (Hymenoptera: Anthophila). Studies on the historic abundance and distribution of bee species in particular have revealed alarming trends of population decline and local extinctions. Given the rapidly growing digital vault of bee data across multiple institutions, we will discuss how NHCs can inform the conservation of bees. However, we will also highlight some of limitations and biases of digital specimen data that must be considered when characterizing bee communities.
Jonathan Koch (, Utah State University, Logan, UT, Joan M. Meiners and Amber D. Tripodi
230 Harnessing specimen data to visualize and investigate the ecology of species
The process of digitizing specimen data can be done via a collecting event approach in order to maximize efficiency and accuracy. The collecting event approach involves attaching specimen data to previously digitized collecting event information, rather than attaching data to specimens. Using source materials such as field notes allows for less transcription errors and increases the precision with which localities can be georeferenced. In addition, this approach distinguishes between true absences and collecting artifacts, allowing a researcher to investigate why specimens occur at certain sites and are absent at others. Several recent digitization projects using this method will be examined, including the North American Macroinvertebrate Database and CReAC. Specimen data can be digitized using source materials such as field notes in order to increase the accuracy of the data and the efficiency of the digitization workflow.
Sarah Schmits (, Andrew Short, University of Kansas, Lawrence, KS
245 The usefulness of DNA-barcoding databases for routine taxonomic research and identification of Lepidoptera
Due to our extensive knowledge about the taxonomy of Lepidoptera and the ease with which the appropriate tissue samples can be obtained from dry specimens in collections, butterflies and moths have served as a model group for developing DNA barcoding methodology. DNA barcoding has now become a routine tool in taxonomy, and many species of Lepidoptera have been barcoded at least once. For example, the Barcode of Life initiative has produced almost a million such sequences for 84,000 species of Lepidoptera (half of the world’s described fauna). However, most species are represented by a single or few barcodes. In my presentation, using my own taxonomic work as well as several examples from the work of my colleagues at the McGuire Center for Lepidoptera and Biodiversity, I would like to argue that many more resources need to be invested into mitochondrial DNA barcoding, generating not only barcodes for species and subspecies that have not been barcoded to date, but also representing as many populations and individuals as possible. While genomic methods are increasingly popular for the purposes of phylogenetic reconstruction and evolutionary research, and currently dominate grant proposals, I argue that it is equally important to direct more resources towards DNA barcoding, which has proved to be the best taxonomic tool developed in the last 100 years for resolving current taxonomic conundrums, for revealing cryptic species and for describing biodiversity.
Andrei Sourakov (, University of Florida, Gainesville, FL
315 The intersection of data domains underlying insect systematics: case studies in parasitic Hymenoptera
Taxonomy/systematics has long been a highly integrative science, uniting data from domains as diverse as geography, time, phenotypic features, genotypes, literature, and media. The core of biodiversity discovery - examination of the features of individual organisms - remains constant, but collection, storage, and sharing of data have transformed the field. Geographic data can be recorded more accurately through the development of new tools and liberation from the space restrictions of specimen labels. The magnitude of error in these data over time and the fitness for use of legacy data will be discussed. Features of organisms can be studied from both a geographic and temporal point of view. As examples, the geographic variation in sex ratio of Pelecinus polyturator and the distribution of mimetic/apomorphic color patterns in Scelioninae are presented. Taxonomic workflows before and after the introduction of databasing technologies are compared, focusing on increased accuracy and completeness. These enhancements come with costs in both time and money, and these costs are quantified. The rate of taxonomic activity over the past 250 years is compared for different taxa, and the social dimension of this work in terms of gender, language, and geographic location are discussed. The introduction and adoption of informatics technologies has altered the needed skill set for the next generation of taxonomists, but this is not yet reflected in formal training programs.
Norman Johnson (, The Ohio State University, Columbus, OH
330 Preventing Bugs in Data Analysis: Data Skills to Improve the Reliability and Effectiveness of Entomological Research
Our increasing capacity to collect data is changing science. This is particularly true as specimen data is being digitized and availability of data is no longer the bottleneck. There is great potential for discovery, but we are primarily failing to translate this sea of data into scientific advances, because researchers are not trained in the skills needed for effective management and analysis. The question then becomes, in addition to scaling data production and computation, how do we develop and deliver training to scale data literate researchers? Course curriculums are slow to change, need qualified instructors and are already full. Short courses are oversubscribed and reach a limited number of participants. To provide scalable and distributed training, Data Carpentry develops and teaches domain-specific hands-on workshops in data organization, management, and analysis. This is a grassroots training effort developed by practitioners for practitioners, who identify core skills and collaboratively develop lessons. All lessons are open source, and workshops are taught by volunteers trained by the Software Carpentry Foundation. With iDigBio, a focus has been on training in the biodiversity community. Workshops are designed for people with little to no prior computational experience and teach in two days how to organize and clean data, manage data in SQL and analyze and visualize data in R – the full data lifecycle. Workshops are in high demand, but this model allows for scaling of training and teaches the foundational skills to get biologists started managing and analyzing their data effectively.
Tracy Teal (, Michigan State University, East Lansing, MI
345 Developing Best Practices for Data Management Across all Stages of the Data Life Cycle
Best practices for empirical data collection (experimental design, laboratory techniques) are often well-covered in undergraduate and graduate training, yet there has been less emphasis on managing the resulting data effectively. This is an increasingly important skill set; many funding agencies require data management plans, and journals are requiring that data pertaining to published articles be accessible. Researchers with good data management skills will be able to maximize the productivity of their research program, effectively and efficiently share their data with the scientific community, and potentially benefit from the re-use of their data by others. In this talk, I will highlight some of the pitfalls to be avoided when working with data and introduce example best practices and tools that will improve your data management skills and research program.
Amber Budden (, DataONE, Albuquerque, NM
400 Data capture methodologies in digitisation of bee pollinators
Digitisation is an activity that museums and academic institutions increasingly recognize, though many still do not embrace, as a means to boost the impact of collections for global research and society through improved access. And as such, many researchers still fail to realise the importance of data capture methodologies used in digitisation. New opportunities exist to design and implement processes through use of the available technology that will support data capture to enable a range of research on biodiversity of pollinators in order to make scientific collections increasingly relevant. While the usefulness of specimen digitisation is true for all taxa, immense additional benefits come from the digitisation of bees. This group of organisms is of prime importance as they provide most of the world’s pollination ecosystem services. Through international collaborative efforts, the wealth of data in natural history museums and collections about the diversity, distribution and biology of bees may be utilised for international biodiversity efforts.
Nicole Fisher (, Australian National Insect Collection (ANIC), Clayton, Australia
415 The Current State of Arthropod Biodiversity Data In North America: Can We Address Impacts of Global Change?
There are well over 500 million arthropod specimens housed in approximately 1,000 collections worldwide. Although reliable estimates are not available, it is likely that less than 5% of these specimens have been digitized and the current rate of digitization is probably not even adequate to keep pace with the acquisition of new specimens. If we hope to achieve the goal of digitizing all specimens by 2050 we need to develop global networks that can overcome many of the constraints we face today. We will review the current holdings of arthropods in collections across continents and digitization efforts from data providers to aggregators. More specifically, we will assess the type of collaborations needed and the technological and social network areas that are developing to obtain the goal of full digitization.
Neil Cobb Northern Arizona University (NAU), Edward Gilbert (, Nico Franz, and Katja C. Seltmann
430 Database before you label – the key to a digitized collections future
Digitization of millions of historic entomology specimens remains an enormous challenge. Our community should not make this challenge worse by generating newly collected, undigitized specimens. Entomologists in North America currently generate many tens of thousands of new specimens annually, that get added to our undigitized backlog. The University of Alaska Museum Insect Collection contains over 1 million specimens represented by ~230,000 database records, of which, 82% have been collected since the year 2000. This talk will describe the rapid growth of our collection and database. Methods used are similar to those established by Costa Rica's INBio in the 1990s.
Derek S. Sikes (, University of Alaska, Fairbanks, AK
445 Troubleshooting industrial insect digitisation
Natural history collections are one of the most important sources of biodiversity information and their digitization is essential for providing greater access to both researchers and the general public. Industrial approaches are needed in order to mobilise the vast numbers of specimens (up to 10 billion) accumulated by the natural history museums in the world. Following the experience of the Digital Collection Programme (DCP) in the Natural History Museum we explore several ways of optimising the digitization process of insect collections. Success is impossible without an organised approach to project management, staff buy-in and administrative support on all levels. Key elements of industrial digitisation are: detailed yet flexible workflows which can accommodate different kinds of digitised material; automation through software and hardware; appropriate staff management; and community involvement.
Vladimir Blagoderov ( and Laurence Livermore, The Natural History Museum, Cromwell Road, London, England
500 DAMmed If You Do or Don’t: Life Cycles of Digital Assets
Imaging of specimens is now regular curatorial practice in entomological collections, complementing longer-standing efforts to capture label data and related information. Many different imaging approaches exist, but a common thread is that vast quantities of images are being amassed rapidly around the globe. Managing, preserving, and safeguarding this proliferation of images is critical to the success of digitizing entomological collections. This talk examines the life cycle of digital assets produced during imaging projects at the Yale Peabody Museum, with focus on student driven workflows in the Entomology Division and other curatorial departments. Once acquired, Peabody’s digital assets flow through its collections management system into a Yale University-wide digital asset management system (DAM). Peabody Entomology helped develop the Yale DAM, harmonize workflow and metadata from dissimilar campus units, and integrate several collections management systems with a single DAM endpoint. Adopting this infrastructure has allowed Peabody to disseminate its images and specimen metadata more broadly into “foreign” contexts, such as the Yale Library’s Finding Aid system and a campus asset discovery portal, alongside more well-known biodiversity outlets for entomological collections such as GBIF and the National Science Foundation’s iDigBio initiative.
Lawrence Gall (, Yale University, New Haven, CT
515 Involving undergraduates in the digital community: Leveraging collections preservation, research, and outreach through a network of natural history collections clubs
In February of 2013 nine students at Arkansas State University came together to form the Natural History Collections Curation Club (NHC3). This club was an innovative approach to resolving many issues facing the natural history collections at A-State. The university houses collections in many disciplines. The collections were primarily built in the 1960s and 1970s and by 2013 several of the collections were in disrepair due to a lack of funding and support. The students of the club made it their goal to restore the collections by dedicating their time and helping to secure funding. These efforts have resulted in funding from the Dean of the College of Sciences and Mathematics for a part-time student worker in the collections, supplies for several projects including jars and ethanol for restoring the fish collections and materials to create two large specimen mounts, and trips to visit several natural history museums. The NHC3 has helped A-State become recognized in the collections field where it was previously unknown. The club has also helped other universities increase student interest and involvement in collections. To date, two other universities have active natural history collections clubs as a result of the A-State model. Beginning in the fall of 2015 these three clubs will form a network to outreach to other universities that may benefit from this model. Our goal is to use the Natural History Collections Club Network (NHCCN) as a platform to motivate students across the United States to become more involved in university specimen collections.
Kari Harris (, Arkansas State University, Jonesboro, AR

And please be sure to see this talk too:

  • Cross-pollination in the 21st Century: Integrating entomologists and botanists to explore the island biogeography and conservation of Caribbean orchids by Peter Houlihan (, Florida Museum of Natural History, Gainesville, FL