Difference between revisions of "Databasing References"

From iDigBio
Jump to: navigation, search
Line 17: Line 17:
|Accelerating Taxonomic Discovery Through Automated Character Extraction, by J. LaSalle, Q. Wheeler, P. Jackway, S. Winterton, D. Hobern, D. Lovell, Zootaxa 2217:43-55.
|Accelerating Taxonomic Discovery Through Automated Character Extraction, by J. LaSalle, Q. Wheeler, P. Jackway, S. Winterton, D. Hobern, D. Lovell, Zootaxa 2217:43-55.
|This paper discusses the following key messages. Taxonomy is (and taxonomists are) more important than ever in times of global change. Taxonomic endeavour is not occurring fast enough: in 250 years since the creation of the Linnean Systema Naturae, only about 20% of Earth’s species have been named. We need fundamental changes to the taxonomic process and paradigm to increase taxonomic productivity by orders of magnitude. Currently, taxonomic productivity is limited principally by the rate at which we capture and manage morphological information to enable species discovery. Many recent (and welcomed) initiatives in managing and delivering biodiversity information and accelerating the taxonomic process do not address this bottleneck. Development of computational image analysis and feature extraction methods is a crucial missing capacity needed to enable taxonomists to overcome the taxonomic impediment in a meaningful time frame.
|This paper discusses the following key messages. Taxonomy is (and taxonomists are) more important than ever in times of global change. Taxonomic endeavour is not occurring fast enough: in 250 years since the creation of the Linnean Systema Naturae, only about 20% of Earth’s species have been named. We need fundamental changes to the taxonomic process and paradigm to increase taxonomic productivity by orders of magnitude. Currently, taxonomic productivity is limited principally by the rate at which we capture and manage morphological information to enable species discovery. Many recent (and welcomed) initiatives in managing and delivering biodiversity information and accelerating the taxonomic process do not address this bottleneck. Development of computational image analysis and feature extraction methods is a crucial missing capacity needed to enable taxonomists to overcome the taxonomic impediment in a meaningful time frame.

Revision as of 10:16, 20 December 2011

Title PubDate URL Description
A Guide to Digitizing Insect Collections, by Sarah Ashworth and Jennifer Fogarty, 55 pages. http://insects.oeb.harvard.edu/MCZ/PDFs/Guide_to_digitizing.pdf# This guide is offered to help anyone wanting to database and/or digitally image their collections. It is written in very simple terms since this work should not require a computer expert. This guide is based on digitizing a type collection so it is more rigorous and careful than may be necessary for other collections, particularly the archive protocol. The user should take or leave whatever information they feel is necessary. However, if the guide is to be used for a type collection, it is recommended that the degree of rigor, if not the actual protocol, be matched. Three different imaging setups are described, from a very inexpensive solution to the top of the range. These descriptions are not intended to prescribe the best or only setups, but to inform others about the setups we are using now as a result of over 5 years research--much of this with limited budgets. We developed these under the guidance of Dr. Piotr Naskrecki, with early and continued assistance by Dr. Gary Alpert and Dr. Brian Farrell. In a similar way we describe a Filemaker Pro implementation called MANTIS. It is one of many database solutions for managing taxonomic information, but is recommended since it is easy to use, runs on both PC and Mac and can be downloaded free from the web. It is also the creation of Piotr Naskrecki.
A Strategic Plan for Establishing a Network Integrated Biocollections Alliance, produced by NSF. http://digbiocol.files.wordpress.com/2010/08/niba_brochure.pdf# This report is a strategic plan for a 10-year effort to digitize and mobilize the scientific information associated with biological specimens held in U.S. research collections. The primary objective of the initiative is to create a national collections resource that will contribute critical information to U.S. scientific research and technology interests, and will aid in understanding the biodiversity dimensions and societal consequences of climate change, species invasions, natural disasters, the spread of disease vectors and agricultural pests and pollinators, and other environmental issues. Network Integrated Biocollections Alliance (NIBA) resources such as databases, network portals, and analytical tools will synthesize information contained in the nation’s collections and place them into national service for stakeholders in government, academia, business, K-12 education, informal science education, and the public.
Accelerating Taxonomic Discovery Through Automated Character Extraction, by J. LaSalle, Q. Wheeler, P. Jackway, S. Winterton, D. Hobern, D. Lovell, Zootaxa 2217:43-55. 2009 www.mapress.com/zootaxa/2009/f/zt02217p055.pdfhttp://www.mapress.com/zootaxa/2009/f/zt02217p055.pdf# This paper discusses the following key messages. Taxonomy is (and taxonomists are) more important than ever in times of global change. Taxonomic endeavour is not occurring fast enough: in 250 years since the creation of the Linnean Systema Naturae, only about 20% of Earth’s species have been named. We need fundamental changes to the taxonomic process and paradigm to increase taxonomic productivity by orders of magnitude. Currently, taxonomic productivity is limited principally by the rate at which we capture and manage morphological information to enable species discovery. Many recent (and welcomed) initiatives in managing and delivering biodiversity information and accelerating the taxonomic process do not address this bottleneck. Development of computational image analysis and feature extraction methods is a crucial missing capacity needed to enable taxonomists to overcome the taxonomic impediment in a meaningful time frame.
Advanced Techniques for Imaging Parasitic Hymenoptera (Insecta), by M. L. Buffington, R.A. Burks, L. McNeil, American Entomologist. Spring, 2005 http://www.entsoc.org/PDF/Pubs/Periodicals/AE/AE-2005/Spring/Buffington.pdf# Digital imaging technology has revolutionized the practice of photographing insects for scientific study. This paper describes lighting techniques designed for imaging parasitic Hymenoptera in the superfamilies Chalcidoidea and Cynipoidea. Techniques described here are applicable to all small insects, as well as other invertebrates. The key to these techniques is the correct balance of light intensity and light dispersal. Once this balance is met, hymenopteran species as small as 0.75 mm can be readily imaged at a resolution suitable for publication. Surprisingly, a compound microscope can be used to image whole, unmounted insects in much the same way that a stereomicroscope is used.
Assembling the Custom Components for Specimen Imaging, Consortium of Pacific Northwest Herbaria, WTU Herbarium, Burke Museum, Version 1.0, by Ben Legler. 7/9/2010 http://www.pnwherbaria.org/documentation/custom-components-v1.pdf# This document provides instructions for assembling the custom hardware components used for imaging specimens under the Consortium of Pacific Northwest Herbaria’s 2010-2013 collaborative NSF Grant. It is intended as a guide for similar projects elsewhere. However, the components described here are specific to our choice of imaging equipment and may not be suitable for use elsewhere. Also discussed here are the custom software scripts used for metadata capture, image processing, and image tiling. The tiling script creates a version of the image that can be viewed with the Gmap Image Viewer (http://www.rmh.uwyo.edu/gmapviewer/about.php), an online, open-source viewer created for use with herbarium specimens.
Australian Museum Data capture of specimen labels using volunteers, by John Tann & Paul Flemons. December 2008 http://australianmuseum.net.au/Uploads/Documents/23183/Data%20Capture%20of%20specimen%20labels%20using%20volunteers%20-%20Tann%20and%20Flemons%202008.pdf# This is a report of an attempt to speed up the capture of information on the labels of specimens held by the Australian Museum. A trial was conducted using volunteers with a camera to photograph specimen labels and transcribe that data into a spreadsheet. Location information was georeferenced. The data in the resulting spreadsheet was then entered into EMu by a museum technician. Times and costs were compared to direct data entry, as well as with a previous trial using an off-shore data transcription service. The outcome of the trial was successful in clarifying the following. Importing data into EMu is not straightforward and is a specialist task. Having the data transcribed into a spreadsheet before import into EMu does not help. Errors, misspellings, and uncertainties on many of the labels meant that a spreadsheet of data became a clumsy and inefficient method of data entry. Photographing a label has advantages – a photograph becomes a verbatim record in the database of the label for later referral, and makes the data entry process quicker by about 20%, as well as easier and more convenient. Recommendations: The Australian museum could train and use a small team of volunteers to photograph specimen labels. These photographs would be saved on Emu as a record of the label, and subsequently used for data entry by AM technical staff. Investigate the Emu inline toolset as a possible route for engaging volunteers for accurate and reliable data entry.
Australian Museum Rapid Digitisation Project: A Guide to Handling and Digising Archival Material - Registers by L. Prater, R. Stephens, and P. Flemons, 19 pp. August 2011 http://australianmuseum.net.au/Uploads/Documents/22932/Archive%20Training%20Compressed.pdf# This publications documents methods for ditigizing printed records associated with museum collections.
Automontage Imaging Guidelines, by AntWeb. June, 2010 http://www.antweb.org/homepage/AntWeb%20Imaging%20guidelines%20v01.pdf# This presentation from AntWeb offers detailed information about imaging ant specimens using Automontage.
Avoiding twisted pixels: ethical guidelines for the appropriate use and manipulation of scientific digital images, by D. W. Cromey, Science and engineering ethics 16 (4) p. 639-67. 2010 http://www.ncbi.nlm.nih.gov/pubmed/20567932# Digital imaging has provided scientists with new opportunities to acquire and manipulate data using techniques that were difficult or impossible to employ in the past. Because digital images are easier to manipulate than film images, new problems have emerged. One growing concern in the scientific community is that digital images are not being handled with sufficient care. The problem is twofold: (1) the very small, yet troubling, number of intentional falsifications that have been identified, and (2) the more common unintentional, inappropriate manipulation of images for publication. Journals and professional societies have begun to address the issue with specific digital imaging guidelines. Unfortunately, the guidelines provided often do not come with instructions to explain their importance. Thus they deal with what should or should not be done, but not the associated 'why' that is required for understanding the rules. This article proposes 12 guidelines for scientific digital image manipulation and discusses the technical reasons behind these guidelines. These guidelines can be incorporated into lab meetings and graduate student training in order to provoke discussion and begin to bring an end to the culture of "data beautification".
Biodiversity Informatics in Action: Identification and Monitoring of Bee Species Using ABIS, by Tom Arbuckle , Stefan Schröder , Volker Steinhage , Dieter Wittmann, Proc. 15th International Symposium Informatics for Environmental Protection, pp. 425-430. October, 2001 http://citeseer.ist.psu.edu/viewdoc/summary?doi= Bees, as the main pollinators of food crops, represent a critical natural resource which needs to be carefully exploited and managed. In recent years, however, destruction of bee's native habitats, infestations, and displacements of native bees by alien bee species have reduced and disturbed bee populations and this is already having considerable impact on global agriculture. A further concurrent problem is that there are probably fewer than 50 taxonomic experts worldwide able to identify bee species. ABIS (Automatic Bee Identification System) is a suite of software tools created for the identification and monitoring of bees. Bee species are rapidly and reliably determined from images of the bees' wings by means of linear and non-linear statistics in conjunction with image processing. Work is currently in progress to couple the bee identification tools within a geographic information system and to make a bee recognition service available over the Intemet.
BNHM Digitization Projects – Results of Survey. http://interactiveu.berkeley.edu/gems/bnhmit/BNHMDigitizationProjects.doc Results of a digitization survey by the Berkeley Natural History Museum assessing digitization efforts of various collections.
Collaborative databasing of North American Bee Collections (NSF-BRC Grant), by Yanega. http://www.ecnweb.org/dev/files/21b_Yanega_2010.pdf# A mostly pictorial report of bee databasing at ten bee collections.
Collection Data Registration at the Nationaal Herbarium Nederland: Data Guidelines, by L. P. M. Willemse, J. B. Mols. March 2007 http://www.nationaalherbarium.nl/virtual/Data-guidelines-NHN.pdf# The aim of this protocol is to present a detailed description of data entry rules and data entry procedures in order to standardize data entry at the NHN. Doing this will create maximum compatibility between the various databases and facilitate data exchange. A complete list of all fields and their dimensions and characteristics is found in Annex A.For most fields lookup lists (using F9) are available in BRAHMS, these should be used to enter the majority of the data. This protocol is in principle a guideline into what kind of information is stored in each field and how to enter the data if it is not available using the look-up lists.
Darwin Core Terms. 2011-10-26 http://rs.tdwg.org/dwc/terms/index.htm# This quick guide provides a list of all current terms of the Darwin core. The terms are organized by categories (in bold) in the index. The categories correspond to Darwin Core terms that are classes (terms that have other terms to describe them). The terms that describe a given class (the class properties) appear in the list immediately below the name of the category in the index. The index provides links to the term descriptions in the table below the index.
Digital Bee Collection Network : DBCNet ( NSF-BRC Grant ), by Yanega. http://www.ecnweb.org/dev/files/21a_Yanega_2010.pdf# Collaborative specimen databasing of bees at ten collections: AMNH, UC Riverside, UC Davis, UC Berkeley, CSCA, Cornell, Uconn, Rutgers, Vermont, USDA Bee Systematics Lab. Includes a brief but important imaging protocol.
Digital Imaging of Biological Type Specimens: A Manual of Best Practice, Results from a study of the European Network for Biodiversity Information, by Christoph L. Häuser, Axel Steiner, Joachim Holstein, Malcolm J. Scoble. 2005 imsgbif.gbif.org/CMS_ORC/?doc_id=2429&download=1http://imsgbif.gbif.org/CMS_ORC/?doc_id=2429&download=1# This book addresses a number of important issues about the digital imaging of biological objects. These topics were explored in two workshops organised by the European Network for Biodiversity Information (ENBI), and the Global Biodiversity Information Facility (GBIF). With the digital imaging of a growing number of biological objects, it has become of great importance to agree on common approaches and standards. Such standardization is particularly important for natural history specimens so as to compare specimens often with only subtle differences in morphology. Emerging technologies are leading to exciting new opportunities in scientific studies and the field of biodiversity is notable among them. This publication includes serveral chapters on important issues, including color management, image file management, metadata standards and practices, and several approaches to varying organismal groups.
Digital Imaging: Ethics, by D. Cromey, The University of Arizona. 2001-2005 http://swehsc.pharmacy.arizona.edu/exppath/resources/pdf/Digital_Imaging_Ethics.pdf# Scientists are usually considered to be respected sources of information and there is the understanding within the scientific community that data must not be inappropriately manipulated or falsified. When this essay was first composed in 2001, there were very few written guidelines for scientists. Now some of the major professional societies have issued policy statements regarding digital imaging, and many scientific journals have revamped their instructions to authors to provide clearer guidance of how they require images to be handled. Publications like the Journal of Cell Biology have begun testing images in accepted articles to ensure compliance with their guidelines and the Office of Research Integrity (HHS) has been watching this issue closely. In this author’s experience the inappropriate manipulation of scientific digital images typically does not arise from an intent to deceive or to obscure information. More often the inappropriate manipulations are simply due to ignorance of basic principles. It seemed to this author that often what is needed is an explanation of why manipulations are right or wrong. These twelve guidelines are an attempt to address this issue. It should be noted that the author has extensive experience in the microscopic imaging of biological specimens and these guidelines reflect his personal experience in this field.
Digitizing the Yale Collections – it takes a Village, by L. Munstermann and L. Gall. 2010 http://www.ecnweb.org/dev/files/gall-ecn-posted.pdf# Report presented at Entomology 2010 using KE Emu software; includes dileanation of the protocols for curation, imaging, and databasing.
Exchangeable image file format for digital still cameras : Exif Version 2.2 (2002). 2002 http://www.exif.org/Exif2-2.PDF# This standard specifies the formats to be used for images, sound, and tags in digital still cameras and in other systems handling the image and sound files recorded by digital still cameras.
From Ink to Electrons: Issues to be Considered, by Larry Speers, Agriculture and Agri-Food Canada. http://www.canadensys.net/wp-content/uploads/montreal-2009-digitization.pdf# A presentation outlining the challenges, needs, and uses of scientific collections data, including import decisions to be made about establishing and conducting a collection digitization program, and the potential problems with such projects and their data.
GBIF TRAINING MANUAL 1: DIGITIZATION OF NATURAL HISTORY COLLECTIONS DATA. 2008 http://www.infoandina.org/system/files/recursos/GBIF_TM1.pdf# The Global Biodiversity Information Facility (GBIF) is a worldwide network that makes primary, scientific, biodiversity data (documented species occurrence data) from many sources openly available via the Internet. It does this by building an information infrastructure that interconnects hundreds of databases, and by promoting the digitisation and sharing of data that are not currently available via the Internet, such as those associated with specimens in natural history museums. This promotion of digitisation is approached in a number of ways: seed money awards to stimulate digitisation projects; the development (with partners) of community-accepted standards for data and metadata, as well as software tools that enable interconnectivity and interoperability; workshops for training in digitisation and data-sharing; and guides such as this training manual and its components. GBIF’s hope is to help collections and database personnel around the world share best practices in the tasks and operations required in building a web-based, global “natural history collection and herbarium” that can be accessed any time any where by any one via the Internet.
How to Digitize Large Insect Collections? Preliminary Results of the Dig Project, by D. Lampe, K-H. Striebing. http://phthiraptera.info/Publications/9073.pdf# In practice the overall efficiency of data-basing the inventory of traditional entomological collections depends on two factors: suitable software and management measures to ensure the hightest possible data quality already in the input process. Lessons learned from the development of the specimen-based database BIODAT and preliminary results of the DIG-(Digitization of key Insect groups at ZFMK) project, which is especially designed to develop a 'good practice', recommend: (1) a lockstep programme for data-basing, (2) data entry of collection units & split record function, (3) visualisation of georeferenced location/sites during data entry, (4) semi-automatic/automatic data transformation from original format into additional alternative ones, (5) semiautomatic data transfer of taxa- and geo-referenced information units. Current activities deal with the introduction of semantic feedback mechanisms into the practice of data-basing entomological collections.
Image Capture and Processing: An Overview, Consortium of Pacific Northwest Herbaria, by Ben Legler. July 26, 2010 http://www.pnwherbaria.org/documentation/imaging-overview.pdf# This document describes the general processes used for image capture, image processing, data capture, and data/image dissemination used by the Consortium of Pacific Northwest Herbaria, as carried out under the Consortium’s 2010-2013 collaborative NSF Grant (DBI0956414). Details are omitted in an attempt to provide an overall understanding of the process.
Imaging of Specimens: Issues to be Considered, by Larry Speers, Agriculture and Agri-Food Canada. http://www.canadensys.net/wp-content/uploads/montreal-2009-imaging.pdf# A short presentation on specimen vs. label imaging, with consideration of image type, format, storage, and work flow.
Initiating a Collection Digitisation Project, by C. K. Frazier, J. Wall, S. Grant. 2008, GBIF http://www.gbif.org/orc/?doc_id=2176# This document is designed to give the reader the confidence to get started and to make the right decisions when planning a natural history collection digitisation project. The authors have years of experience working with collections and they have instilled this expertise into this paper so one can more efficiently ask the right questions and make the appropriate plans prior to committing any resources to the task.
Innovative workflows for efficient data capture in an entomological collection: The MCZ Lepidoptera Rapid Data Capture Project, by P. Morris, R. Eastwood, L. Ford, et. al. http://www.ecnweb.org/dev/files/12_Eastwood_2010.pdf# Presentation made at Entomology 2010. Signficant detail on the rapid data capture project at Museum of Comparative Zoology, including an efficient workflow.
Moving Theory into Practice: Digital Imaging Tutorial. http://www.library.cornell.edu/preservation/tutorial/tutorial_English.pdf# An excellent online tutorial about digital imaging, including basic terminology, selection, conversion, quality, and metadata.
NATURAL HISTORY SPECIMEN DIGITIZATION : CHALLENGES AND CONCERNS, by A. Vollmar, J. A. Macklin, L. S. Ford., Biodiversity Informatics, 7, pp. 93-112. 2010 https://journals.ku.edu/index.php/jbi/article/viewFile/3992/3806# A survey on the challenges and concerns involved with digitizing natural history specimens was circulated to curators, collections managers, and administrators in the natural history community in the Spring of 2009, with over 200 responses received. The overwhelming barrier to digitizing collections was a lack of funding or issues directly related to funding, leaving institutions mostly responsible for providing the necessary support. The uneven digitization landscape leads to a patchy accumulation of records at varying qualities, and based on different priorities, ultimately influencing the data's fitness for use. The survey results also indicated that although the kind of specimens found in collections and their storage can be quite variable, there are many similar challenges across disciplines when digitizing including imaging, automated text scanning and parsing, geo-referencing, etc. Thus, better communication between domains could foster knowledge on digitization leading to efficiencies that could be disseminated through documentation of best practices and training.
New York Botanical Garden Virtual Herbarium Best Practices Guide. sciweb.nybg.org/Science2/hcol/mtsc/NYBG_Best_Practices.dochttp://sciweb.nybg.org/Science2/hcol/mtsc/NYBG_Best_Practices.doc# The purpose of this guide is to lay out the governing principles and procedures that have evolved over the ten years of experience with the NYBG Virtual Herbarium. Hopefully this document will be useful in future years in explaining the rationale behind the approach taken and decisions made along the way, and may be useful to other institutions who are just now embarking on a Virtual Herbarium project, or searching for comparative or benchmark data.
Not another frickin' database!, by Michael Wall. 2010 http://www.ecnweb.org/dev/files/13_Wall_2010.pdf# This document reports on the Entomology Collection Health Online database at San Diego Natural History Museum. ECHO is not a specimen database. Rather, ECHO allows users to search the collection of taxa, assess the collection health of those taxa, and examine large images of drawers containing taxa. All drawers and schmidt boxes in the collection are catalogued to the lowest determined taxonomid level. SDNHM contains 219 databased drawers. Data on curatorial health is available for approximately 42,100 specimens.
Principals of Data Quality, A. Chapman. 2005, GBIF imsgbif.gbif.org/CMS_ORC/?doc_id=1229&download=1http://imsgbif.gbif.org/CMS_ORC/?doc_id=1229&download=1# There are many data quality principles that apply when dealing with species data and especially with the spatial aspects of those data. These principles are involved at all stages of the data management process. A loss of data quality at any one of these stages reduces the applicability and uses to which the data can be adequately put. These include: Data capture and recording at the time of gathering, Data manipulation prior to digitisation (label preparation, copying of data to a ledger, etc.), Identification of the collection (specimen, observation) and its recording, Digitisation of the data, Documentation of the data (capturing and recording the metadata), Data storage and archiving, Data presentation and dissemination (paper and electronic publications, web-enabled databases, etc.), Using the data (analysis and manipulation). All these have an input into the final quality or “fitness for use” of the data and all apply to all aspects of the data – the taxonomic or nomenclatural portion of the data – the “what”, the spatial portion – the “where” and other data such as the “who” and the “when” (Berendsohn 1997). Before a detailed discussion on data quality and its application to species-occurrence data can take place, there are a number of concepts that need to be defined and described. These include the term data quality itself, the terms accuracy and precision that are often misapplied, and what we mean by primary species data and species-occurrence data.
Reference Model for an Open Archival Information System (OAIS), by the Space Data Systems, Consultative Committee. January 2002 http://public.ccsds.org/publications/archive/650x0b1.PDF# This document is a technical recommendation for use in developing a broader consensus on what is required for an archive to provide permanent or indefinite long-term preservation of digital information. The recommendation establishes a common framework of terms and concepts which comprise an Open Archival Information System (OAIS). It allows existing and future archives to be more meaningfully compared and contrasted. It provides a basis for further standardization within an archival context and it should promote greater vendor awareness of, and support of, archival requirements. Through the process of normal evolution, it is expected that expansion, deletion, or modification of this document may occur. This Recommendation is therefore subject to CCSDS document management and change control procedures which are defined in Procedures Manual for the Consultative Committee for Space Data Systems.
Relational database design and implementation for biodiversity informatics, by P. J. Morris, PhyloInformatics 7: 1-66 - 2005. 2005 http://www.athro.com/general/Phyloinformatics_7_85x11.pdf# The complexity of natural history collection information and similar information within the scope of biodiversity informatics poses significant challenges for effective long term stewardship of that information in electronic form. This paper discusses the principles of good relational database design, how to apply those principles in the practical implementation of databases, and examines how good database design is essential for long term stewardship of biodiversity information. Good design and implementation principles are illustrated with examples from the realm of biodiversity information, including an examination of the costs and benefits of different ways of storing hierarchical information in relational databases. This paper also discusses typical problems present in legacy data, how they are characteristic of efforts to handle complex information in simple databases, and methods for handling those data during data migration.
Report on trial of SatScan tray scanner system by SmartDrive Ltd., by V. Blagoderov, I. Kitching, T. Simonsen, V. Smith; Natural History Museum, London. March 8-April 9, 2010 http://vsmith.info/files/npre20104486-1.pdf# Smartdrive Ltd. has developed a prototype imaging system, SatScan, that captures digitised images of large areas while keeping smaller objects in focus at very high resolution. The system was set up in the Sackler Biodiversity Imaging laboratory of Natural History Museum on March 8, 2010 for a one-month trial. A series of projects imaging parts of the entomological, botanical, and palaeoentomological collections were conducted to assess the system's utility for museum collection management and biodiversity research. The technical and practical limitations of the system were investigated as part of this process.The SatScan system facilitates the capturing of a very large number of good quality images in a very short time. Large parts of the NHM collection could be digitised in dorsal view extremely quickly. These images have a wide range of uses across research, collection management, and public engagement. Scalability of the system is limited by our desire to assign unique identifiers (a number and/or a barcode) to specimens, and the cropping of these images. Without these identifiers digitised images will have limited long term value. The assignment of specimen level identifiers is potentially labour intensive. Options for assigning identifiers were not investigated as part of this trail but include the use of physical labels on each specimen (with significant resource implications and a significant volunteer workforce) and the use of virtual identifiers (as a virtual layer over the image, perhaps automatically assigned, and possible coupled with physical labels attached to specimens as dictated by recuration activities). Intuitive software (with a web interface) needs to be developed to facilitate this process, including support for cropping of an image and the automatic assignment and printing of an identifier label. On-demand assignment of identifiers would allow us to prioritize the digitisation but it will represent a significant change to the way we curate our collections and would require sustained and ongoing support from Collection Management. Additional concerns relate to the amount of storage space required to manage images, connection with existing digital systems and the utility of dorsal images for certain parts of the collection. These problems need to be addressed as part of a larger scale study.
Scientific Collections: Mission-Critical Infrastructure for Federal Science Agencies, A Report of the Interagency Working Group on Scientific Collections. 2009 http://www.whitehouse.gov/sites/default/files/sci-collections-report-2009-rev2.pdf# This report represents the first step in an ongoing process of identifying and characterizing scientific collections and determining their long-term stewardship needs. Robust interagency collaboration will remain vital as we develop a systematic approach to safeguarding these scientific treasures for generations of scientists. Also see: https://www.ida.org/upload/stpi/pdfs/ida-d-3694-final.pdf
Semi-automated workflows for acquiring specimen data from label images in herbarium collections, Taxon, Volume 59, Number 6, pp. 1830-1842. December 2010 http://www.ingentaconnect.com/content/iapt/tax/2010/00000059/00000006/art00014# Computational workflow environments are an active area of computer science and informatics research; they promise to be effective for automating biological information processing for increasing research efficiency and impact. In this project, semi-automated data processing workflows were developed to test the efficiency of computerizing information contained in herbarium plant specimen labels. Our test sample consisted of mexican and Central American plant specimens held in the University of michigan Herbarium (MICH). The initial data acquisition process consisted of two parts: (1) the capture of digital images of specimen labels and of full-specimen herbarium sheets, and (2) creation of a minimal field database, or "pre-catalog", of records that contain only information necessary to uniquely identify specimens. For entering "pre-catalog" data, two methods were tested: key-stroking the information (a) from the specimen labels directly, or (b) from digital images of specimen labels. In a second step, locality and latitude/longitude data fields were filled in if the values were present on the labels or images. If values were not available, geo-coordinates were assigned based on further analysis of the descriptive locality information on the label. Time and effort for the various steps were measured and recorded. Our analysis demonstrates a clear efficiency benefit of articulating a biological specimen data acquisition workflow into discrete steps, which in turn could be individually optimized. First, we separated the step of capturing data from the specimen from most keystroke data entry tasks. We did this by capturing a digital image of the specimen for the first step, and also by limiting initial key-stroking of data to create only a minimal "pre-catalog" database for the latter tasks. By doing this, specimen handling logistics were streamlined to minimize staff time and cost. Second, by then obtaining most of the specimen data from the label images, the more intellectually challenging task of label data interpretation could be moved electronically out of the herbarium to the location of more highly trained specialists for greater efficiency and accuracy. This project used experts in the plants' country of origin, mexico, to verify localities, geography, and to derive geo-coordinates. Third, with careful choice of data fields for the "pre-catalog" database, specimen image files linked to the minimal tracking records could be sorted by collector and date of collection to minimize key-stroking of redundant data in a continuous series of labels, resulting in improved data entry efficiency and data quality.
Specimen Imaging Documentation: Consortium of Pacific Northwest Herbaria, Version 4.0. November 11, 2011 http://www.pnwherbaria.org/documentation/imaging-documentation-v4.pdf# Detailed, step-by-step documentation for herbarium imaging/label capture from the Consortium of Pacific Northwest Herbaria.
The use of specimen label images for efficient data acquisition in research collections cataloguing: Workflow, Ingio Granzow-de la Cerda, Juan Carols Gomez-Martinez, Jose Luis Garcia-Castillo. http://tdwg2006.tdwg.org/fileadmin/2006meeting/slides/GranzowCerda_ImagesM%C3%A9xMichCatalog_abs0098.pdf# A presentation about an NSF-BRC project to digitize specimen of Mexican plants at the University of Michigan Herbarium, including consideration of equipment, work flow, and databasing.
Uses of Primary Species-Occurence Data, by A. Chapman. 2005, GBIF http://www.nlbif.nl/news_en/files/UsesPrimaryData.pdf# This paper examines uses for primary species occurrence data in research, education, and in other areas of human endeavour, and provides examples from the literature of many of these uses. The paper examines not only data from labels, or from observational notes, but the data inherent in museum and herbarium collections themselves, which are long-term storage receptacles of information and data that are still largely untouched. Projects include the study of the species and their distributions through both time and space, their use for education, both formal and public, for conservation and scientific research, use in medicine and forensic studies, in natural resource management and climate change, in art, history and recreation, and for social and political use. Uses are many and varied and may well form the basis of much of what we do as people every day.
Utility (and Shortcomings) of High Resolution Drawer Imaging for Remote Curation and Outreach, by M. Bertone, A. Deans, North Carolina State University. 2010 http://www.ecnweb.org/dev/files/17_Bertone_2010.pdf# A very good presentation to Entomology 2010 about NCSU's drawer imaging system using Gigapan technology.
VertNet: A New Model for Biodiversity Data Sharing, by Heather Constable, Robert Guralnick, John Wieczorek, Carol Spencer, A. Townsend Peterson, The VertNet Steering Committee, PLoS Biology, Volume 8, Issue 2, e1000309. February 2010 http://www.plosbiology.org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pbio.1000309&representation=PDF# A paper on the vertebrate biodiversity networks. The fundamental concept underlying the vertebrate biodiversity networks is that data contributors are the primary and authoritative source for information about the occurrence data over which they have custody. The networks merely facilitate access and sharing of these distributed primary resources. A fully decentralized architecture, with all requests distributed directly to the primary sources, highlighted the primacy of the contributing institutions and was an essential phase in promoting participation, instilling confidence and a sense of control within the community.