Data Without Borders ICE 2016

From iDigBio
Revision as of 15:18, 21 December 2015 by Dpaul (Talk | contribs)

Jump to: navigation, search
Digitizing the Past and Present for the Future
IDigBio Logo RGB.png

Quick Links for Data Without Borders ICE 2016
[Agenda]
[Biblio]
[Report]

ICE 2016 Symposium Abstract

Summary Statement: Many new and updated methods for collecting biological specimens now result in faster access for everyone to richer, more robust data for research. Scientists are learning new skills for collecting and managing field and lab data using relevant data standards, and publishing enhanced data sets as a result. Best practices for describing data sets with metadata are leading to improved data discovery. Researchers now have access to ever larger data sets for visualization, analysis, and modeling. In our symposium, we present a broad array of examples of the latest developments in biodiversity research using biological specimen data, including genomics, habitat, and trait data. We present current trends in collecting and vouchering of specimens and field data, methods and tools for digitizing the specimen data, and tools and skills needed for visualizing the data. We then highlight how the data are being used, especially for research that expands our understanding of biodiversity. Our Data without Borders session naturally fits the Entomology without Borders theme, addressing the world-wide need for fit-for-research-use data. An underlying theme for Data without Borders is International Collaboration for Biodiversity. In the last ten years, many changes such as powerful hand-held devices, apps, and computing in-the-cloud have made it possible to collect, use, and share data more easily, and in ways that support re-use. Collaboration makes it possible not only to document biodiversity more quickly but also to provide better tools and better data. We will provide examples of this type of collaboration in this symposium.

Short Description

Description: In Data without Borders, we feature talks about collecting museum specimens and digitizing the specimen data to support biodiversity research. Scientists show us how they are using biological specimen data in their research and we include presentations on career skills needed for 21st century digital collections and collaborative research.


time talk presenter / authors
Specimen Data in Integrated Biodiversity Research Pamela Soltis (psoltis@flmnh.ufl.edu), University of Florida, Gainesville, FL
Cross-pollination in the 21st Century: Integrating entomologists and botanists to explore the island biogeography and conservation of Caribbean orchids Peter Houlihan (phoulihan@ufl.edu), Florida Museum of Natural History, Gainesville, FL
Like blood from that stone we always hear about: a quest to extract meaningful data from historical grasshopper specimens
Introduction: One of the most ancient ecosystems in the southeastern U.S.A. is scrub, often associated with ridge systems that were most likely used as refugia during Pleistocene sea level changes. Following sea level stabilization, these habitats effectively remained islands due to unique soil composition and a lack of plant diversity leading to a myriad of floral and faunal endemics. In particular, arthropod endemics abound as in the grasshopper genus Melanoplus (Orthoptera: Acrididae: Melanoplinae). Many genus members possess short wings incapable of flight and are unable to easily disperse over large distances, which makes such Melanoplus species ideal candidates for examining speciation hypotheses. To test such hypotheses, the Puer Group (PG), comprised of 24 species with related morphology, was chosen. The group spans four neighboring states (FL, GA, SC, and NC), contains many scrub endemics, and its males exhibit great genitalia variation. A good beginning for delving deeper into the group’s evolutionary history was determining current species ranges by georeferencing around 5,000 specimens, borrowed from various U.S. collections and gathered in the field during recent expeditions.
Methods: Via the creation of maps, detailed field notes, and different type of anatomical imaging, the backbone of this project is to collate as much data as possible for the PG.
Results/Conclusion: Then, it will be disseminated it to a wide audience for a trio of purposes: 1) raise awareness of a fascinating system of study, 2) create a solid platform for future studies to build upon, and 3) demonstrate the utility of integrating multiple methods of investigation.
Derek Woller (asilid@gmail.com) and Hojun Song, Texas A & M University, College Station, TX
Acquisition, management, and analysis of historical and contemporary data to discern legacy effects of ecological extinction on insect biodiversity
Numerous economically important tree species are threatened with declines due to exotic pests or pathogens. Perhaps the best-known example is American chestnut, a culturally and ecologically important tree decimated by chestnut blight. The loss of American chestnut from the canopy in eastern deciduous forest had profound impacts on vertebrate food webs, but the effects of chestnut loss on insect biodiversity and trophic interactions remain largely unknown. The development of blight-resistant chestnut creates an unparalleled opportunity to study the effects of foundation species loss, and potential recovery, on insect food webs. Our objective is to generate and analyze historical and contemporary insect food webs for chestnut and oak focusing on herbivores and their natural enemies. We constructed a data matrix containing 1,049 records of individual insects associated with chestnut based on information and specimens from the Hopkins Notes and Records System housed at the Smithsonian Institution National Museum of Natural History. These records constitute 221 species, including herbivores and their natural enemies. In order to discern the extent of novel data derived from the Hopkins System, we surveyed the primary literature to construct a data matrix containing 259 records of individual insects representing 157 species associated with chestnut. Additionally, we placed three insect flight traps each in the canopies of American chestnut, Chinese chestnut, and red oak trees to discern the contemporary insect fauna sympatric with those tree species. This effort yielded 75, 72, and 75 samples, respectively, for the aforementioned trees with target insects exceeding 100,000 specimens. Lastly, we hand collected 279 lepidopteran larvae to discern host-natural enemy associations on the target tree species. This presentation focuses on three areas: (1) how multitrophic data derived from natural history collections, the literature, and contemporary sampling are gathered and managed; (2) questions data gathered from those sources might address using this research as an example; and (3) how products from this research are disseminated.
Robert Kula (Robert.Kula@ars.usda.gov), USDA - ARS, Washington, DC
Digitizing natural history collection specimens to investigate the future of species conservation Jonathan Koch (jonathan.koch@usu.edu), Utah State University, Logan, UT
Harnessing specimen data to visualize and investigate the ecology of species Sarah Schmidt (schmidts@ku.edu), University of Kansas, Lawrence, KS
The usefulness of DNA-barcoding databases for routine taxonomic research and identification of Lepidoptera
Due to our extensive knowledge about the taxonomy of Lepidoptera and the ease with which the appropriate tissue samples can be obtained from dry specimens in collections, butterflies and moths have served as a model group for developing DNA barcoding methodology. DNA barcoding has now become a routine tool in taxonomy, and many species of Lepidoptera have been barcoded at least once. For example, the Barcode of Life initiative has produced almost a million such sequences for 84,000 species of Lepidoptera (half of the world’s described fauna). However, most species are represented by a single or few barcodes. In my presentation, using my own taxonomic work as well as several examples from the work of my colleagues at the McGuire Center for Lepidoptera and Biodiversity, I would like to argue that many more resources need to be invested into mitochondrial DNA barcoding, generating not only barcodes for species and subspecies that have not been barcoded to date, but also representing as many populations and individuals as possible. While genomic methods are increasingly popular for the purposes of phylogenetic reconstruction and evolutionary research, and currently dominate grant proposals, I argue that it is equally important to direct more resources towards DNA barcoding, which has proved to be the best taxonomic tool developed in the last 100 years for resolving current taxonomic conundrums, for revealing cryptic species and for describing biodiversity.
Andrei Sourakov (asourakov@flmnh.ufl.edu), University of Florida, Gainesville, FL
The intersection of data domains underlying insect systematics: case studies in parasitic Hymenoptera
Taxonomy/systematics has long been a highly integrative science, uniting data from domains as diverse as geography, time, phenotypic features, genotypes, literature, and media. The core of biodiversity discovery - examination of the features of individual organisms - remains constant, but collection, storage, and sharing of data have transformed the field. Geographic data can be recorded more accurately through the development of new tools and liberation from the space restrictions of specimen labels. The magnitude of error in these data over time and the fitness for use of legacy data will be discussed. Features of organisms can be studied from both a geographic and temporal point of view. As examples, the geographic variation in sex ratio of Pelecinus polyturator and the distribution of mimetic/apomorphic color patterns in Scelioninae are presented. Taxonomic workflows before and after the introduction of databasing technologies are compared, focusing on increased accuracy and completeness. These enhancements come with costs in both time and money, and these costs are quantified. The rate of taxonomic activity over the past 250 years is compared for different taxa, and the social dimension of this work in terms of gender, language, and geographic location are discussed. The introduction and adoption of informatics technologies has altered the needed skill set for the next generation of taxonomists, but this is not yet reflected in formal training programs.
Norman Johnson (johnson.2@osu.edu), The Ohio State University, Columbus, OH
Preventing Bugs in Data Analysis: Data Skills to Improve the Reliability and Effectiveness of Entomological Research
Our increasing capacity to collect data is changing science. This is particularly true as specimen data is being digitized and availability of data is no longer the bottleneck. There is great potential for discovery, but we are primarily failing to translate this sea of data into scientific advances, because researchers are not trained in the skills needed for effective management and analysis. The question then becomes, in addition to scaling data production and computation, how do we develop and deliver training to scale data literate researchers? Course curriculums are slow to change, need qualified instructors and are already full. Short courses are oversubscribed and reach a limited number of participants. To provide scalable and distributed training, Data Carpentry develops and teaches domain-specific hands-on workshops in data organization, management, and analysis. This is a grassroots training effort developed by practitioners for practitioners, who identify core skills and collaboratively develop lessons. All lessons are open source, and workshops are taught by volunteers trained by the Software Carpentry Foundation. With iDigBio, a focus has been on training in the biodiversity community. Workshops are designed for people with little to no prior computational experience and teach in two days how to organize and clean data, manage data in SQL and analyze and visualize data in R – the full data lifecycle. Workshops are in high demand, but this model allows for scaling of training and teaches the foundational skills to get biologists started managing and analyzing their data effectively.
Tracy Teal (tkteal@datacarpentry.org), Michigan State University, East Lansing, MI
Developing Best Practices for Data Management Across all Stages of the Data Life Cycle
Best practices for empirical data collection (experimental design, laboratory techniques) are often well-covered in undergraduate and graduate training, yet there has been less emphasis on managing the resulting data effectively. This is an increasingly important skill set; many funding agencies require data management plans, and journals are requiring that data pertaining to published articles be accessible. Researchers with good data management skills will be able to maximize the productivity of their research program, effectively and efficiently share their data with the scientific community, and potentially benefit from the re-use of their data by others. In this talk, I will highlight some of the pitfalls to be avoided when working with data and introduce example best practices and tools that will improve your data management skills and research program.
Amber Budden (aebudden@dataone.unm.edu), DataONE, Albuquerque, NM
Data capture methodologies in digitisation of bee pollinators
Digitisation is an activity that museums and academic institutions increasingly recognize, though many still do not embrace, as a means to boost the impact of collections for global research and society through improved access. And as such, many researchers still fail to realise the importance of data capture methodologies used in digitisation. New opportunities exist to design and implement processes through use of the available technology that will support data capture to enable a range of research on biodiversity of pollinators in order to make scientific collections increasingly relevant. While the usefulness of specimen digitisation is true for all taxa, immense additional benefits come from the digitisation of bees. This group of organisms is of prime importance as they provide most of the world’s pollination ecosystem services. Through international collaborative efforts, the wealth of data in natural history museums and collections about the diversity, distribution and biology of bees may be utilised for international biodiversity efforts.
Nicole Fisher (Nicole.Fisher@csiro.au), Australian National Insect Collection (ANIC), Clayton, Australia
Arthropod collection digitization and networking across the New World
There are well over 500 million arthropod specimens housed in approximately 1,000 collections worldwide. Although reliable estimates are not available, it is likely that less than 5% of these specimens have been digitized and the current rate of digitization is probably not even adequate to keep pace with the acquisition of new specimens. If we hope to achieve the goal of digitizing all specimens by 2050 we need to develop global networks that can overcome many of the constraints we face today. We will review the current holdings of arthropods in collections across continents and digitization efforts from data providers to aggregators. More specifically, we will assess the type of collaborations needed and the technological and social network areas that are developing to obtain the goal of full digitization.
Neil Cobb Northern Arizona University (NAU), Edward Gilbert (egbot@asu.edu), Arizona State University, School of Life Sciences, Tempe, AZ
Database before you label – the key to a digitized collections future
Digitization of millions of historic entomology specimens remains an enormous challenge. Our community should not make this challenge worse by generating newly collected, undigitized specimens. Entomologists in North America currently generate many tens of thousands of new specimens annually, that get added to our undigitized backlog. The University of Alaska Museum Insect Collection contains over 1 million specimens represented by ~230,000 database records, of which, 82% have been collected since the year 2000. This talk will describe the rapid growth of our collection and database. Methods used are similar to those established by Costa Rica's INBio in the 1990s.
Derek S. Sikes (dssikes@alaska.edu), University of Alaska, Fairbanks, AK
Troubleshooting industrial insect digitisation
Natural history collections are one of the most important sources of biodiversity information and their digitization is essential for providing greater access to both researchers and the general public. Industrial approaches are needed in order to mobilise the vast numbers of specimens (up to 10 billion) accumulated by the natural history museums in the world. Following the experience of the Digital Collection Programme (DCP) in the Natural History Museum we explore several ways of optimising the digitization process of insect collections. Success is impossible without an organised approach to project management, staff buy-in and administrative support on all levels. Key elements of industrial digitisation are: detailed yet flexible workflows which can accommodate different kinds of digitised material; automation through software and hardware; appropriate staff management; and community involvement.
Vladimir Blagoderov (vlab@nhm.ac.uk) and Laurence Livermore, The Natural History Museum, Cromwell Road, London, England
DAMmed If You Do or Don’t: Life Cycles of Digital Assets
Imaging of specimens is now regular curatorial practice in entomological collections, complementing longer-standing efforts to capture label data and related information. Many different imaging approaches exist, but a common thread is that vast quantities of images are being amassed rapidly around the globe. Managing, preserving, and safeguarding this proliferation of images is critical to the success of digitizing entomological collections. This talk examines the life cycle of digital assets produced during imaging projects at the Yale Peabody Museum, with focus on student driven workflows in the Entomology Division and other curatorial departments. Once acquired, Peabody’s digital assets flow through its collections management system into a Yale University-wide digital asset management system (DAM). Peabody Entomology helped develop the Yale DAM, harmonize workflow and metadata from dissimilar campus units, and integrate several collections management systems with a single DAM endpoint. Adopting this infrastructure has allowed Peabody to disseminate its images and specimen metadata more broadly into “foreign” contexts, such as the Yale Library’s Finding Aid system and a campus asset discovery portal, alongside more well-known biodiversity outlets for entomological collections such as GBIF and the National Science Foundation’s iDigBio initiative.
Lawrence Gall (lawrence.gall@yale.edu), Yale University, New Haven, CT
Involving undergraduates in the digital community: Leveraging collections preservation, research, and outreach through a network of natural history collections clubs
In February of 2013 nine students at Arkansas State University came together to form the Natural History Collections Curation Club (NHC3). This club was an innovative approach to resolving many issues facing the natural history collections at A-State. The university houses collections in many disciplines. The collections were primarily built in the 1960s and 1970s and by 2013 several of the collections were in disrepair due to a lack of funding and support. The students of the club made it their goal to restore the collections by dedicating their time and helping to secure funding. These efforts have resulted in funding from the Dean of the College of Sciences and Mathematics for a part-time student worker in the collections, supplies for several projects including jars and ethanol for restoring the fish collections and materials to create two large specimen mounts, and trips to visit several natural history museums. The NHC3 has helped A-State become recognized in the collections field where it was previously unknown. The club has also helped other universities increase student interest and involvement in collections. To date, two other universities have active natural history collections clubs as a result of the A-State model. Beginning in the fall of 2015 these three clubs will form a network to outreach to other universities that may benefit from this model. Our goal is to use the Natural History Collections Club Network (NHCCN) as a platform to motivate students across the United States to become more involved in university specimen collections.
Kari Harris (kari.panhorst@smail.astate.edu), Arkansas State University, Jonesboro, AR