Databasing References

Databasing References

A Guide to Digitizing Insect Collections, by Sarah Ashworth and Jennifer Fogarty, 55 pages.

Pub Date
URL http://insects.oeb.harvard.edu/MCZ/PDFs/Guide_to_digitizing.pdf#
Description This guide is offered to help anyone wanting to database and/or digitally image their collections. It is written in very simple terms since this work should not require a computer expert. This guide is based on digitizing a type collection so it is more rigorous and careful than may be necessary for other collections, particularly the archive protocol. The user should take or leave whatever information they feel is necessary. However, if the guide is to be used for a type collection, it is recommended that the degree of rigor, if not the actual protocol, be matched. Three different imaging setups are described, from a very inexpensive solution to the top of the range. These descriptions are not intended to prescribe the best or only setups, but to inform others about the setups we are using now as a result of over 5 years research--much of this with limited budgets. We developed these under the guidance of Dr. Piotr Naskrecki, with early and continued assistance by Dr. Gary Alpert and Dr. Brian Farrell. In a similar way we describe a Filemaker Pro implementation called MANTIS. It is one of many database solutions for managing taxonomic information, but is recommended since it is easy to use, runs on both PC and Mac and can be downloaded free from the web. It is also the creation of Piotr Naskrecki.

BNHM Digitization Projects – Results of Survey.

Pub Date
URL http://interactiveu.berkeley.edu/gems/bnhmit/BNHMDigitizationProjects.doc
Description Results of a digitization survey by the Berkeley Natural History Museum assessing digitization efforts of various collections.

Collaborative databasing of North American Bee Collections (NSF-BRC Grant), by Yanega.

Pub Date
URL http://ecnweb.org/sites/default/files/21b_Yanega_2010.pdf#
Description A mostly pictorial report of bee databasing at ten bee collections.

Collection Data Registration at the Nationaal Herbarium Nederland: Data Guidelines, by L. P. M. Willemse, J. B. Mols.

Pub Date March 2007
URL http://www.nationaalherbarium.nl/virtual/Data-guidelines-NHN.pdf#
Description The aim of this protocol is to present a detailed description of data entry rules and data entry procedures in order to standardize data entry at the NHN. Doing this will create maximum compatibility between the various databases and facilitate data exchange. A complete list of all fields and their dimensions and characteristics is found in Annex A.For most fields lookup lists (using F9) are available in BRAHMS, these should be used to enter the majority of the data. This protocol is in principle a guideline into what kind of information is stored in each field and how to enter the data if it is not available using the look-up lists.

Darwin Core Terms.

Pub Date 2011-10-26
URL http://rs.tdwg.org/dwc/terms/index.htm#
Description This quick guide provides a list of all current terms of the Darwin core. The terms are organized by categories (in bold) in the index. The categories correspond to Darwin Core terms that are classes (terms that have other terms to describe them). The terms that describe a given class (the class properties) appear in the list immediately below the name of the category in the index. The index provides links to the term descriptions in the table below the index.

Digital Bee Collection Network : DBCNet ( NSF-BRC Grant ), by Yanega.

Pub Date
URL http://ecnweb.org/sites/default/files/21a_Yanega_2010.pdf#
Description Collaborative specimen databasing of bees at ten collections: AMNH, UC Riverside, UC Davis, UC Berkeley, CSCA, Cornell, Uconn, Rutgers, Vermont, USDA Bee Systematics Lab. Includes a brief but important imaging protocol.

Digital Imaging of Biological Type Specimens: A Manual of Best Practice, Results from a study of the European Network for Biodiversity Information, by Christoph L. Häuser, Axel Steiner, Joachim Holstein, Malcolm J. Scoble.

Pub Date 2005
URL http://imsgbif.gbif.org/CMS_ORC/?doc_id=2429&download=1#
Description This book addresses a number of important issues about the digital imaging of biological objects. These topics were explored in two workshops organised by the European Network for Biodiversity Information (ENBI), and the Global Biodiversity Information Facility (GBIF). With the digital imaging of a growing number of biological objects, it has become of great importance to agree on common approaches and standards. Such standardization is particularly important for natural history specimens so as to compare specimens often with only subtle differences in morphology. Emerging technologies are leading to exciting new opportunities in scientific studies and the field of biodiversity is notable among them. This publication includes serveral chapters on important issues, including color management, image file management, metadata standards and practices, and several approaches to varying organismal groups.

Digital Imaging: Ethics, by D. Cromey, The University of Arizona.

Pub Date 2001-2005
URL http://swehsc.pharmacy.arizona.edu/exppath/resources/pdf/Digital_Imaging_Ethics.pdf#
Description Scientists are usually considered to be respected sources of information and there is the understanding within the scientific community that data must not be inappropriately manipulated or falsified. When this essay was first composed in 2001, there were very few written guidelines for scientists. Now some of the major professional societies have issued policy statements regarding digital imaging, and many scientific journals have revamped their instructions to authors to provide clearer guidance of how they require images to be handled. Publications like the Journal of Cell Biology have begun testing images in accepted articles to ensure compliance with their guidelines and the Office of Research Integrity (HHS) has been watching this issue closely. In this author’s experience the inappropriate manipulation of scientific digital images typically does not arise from an intent to deceive or to obscure information. More often the inappropriate manipulations are simply due to ignorance of basic principles. It seemed to this author that often what is needed is an explanation of why manipulations are right or wrong. These twelve guidelines are an attempt to address this issue. It should be noted that the author has extensive experience in the microscopic imaging of biological specimens and these guidelines reflect his personal experience in this field.

Digitizing the Yale Collections – it takes a Village, by L. Munstermann and L. Gall.

Pub Date 2010
URL http://ecnweb.org/sites/default/files/gall-ecn-posted.pdf#
Description Report presented at Entomology 2010 using KE Emu software; includes delineation of the protocols for curation, imaging, and databasing.

Exchangeable image file format for digital still cameras : Exif Version 2.2 (2002).

Pub Date 2002
URL http://www.exif.org/Exif2-2.PDF#
Description This standard specifies the formats to be used for images, sound, and tags in digital still cameras and in other systems handling the image and sound files recorded by digital still cameras.

From Ink to Electrons: Issues to be Considered, by Larry Speers, Agriculture and Agri-Food Canada.

Pub Date
URL http://www.canadensys.net/wp-content/uploads/montreal-2009-digitization.pdf#
Description A presentation outlining the challenges, needs, and uses of scientific collections data, including import decisions to be made about establishing and conducting a collection digitization program, and the potential problems with such projects and their data.

GBIF Training Manual 1: Digitization of Natural History Collections Data.

Pub Date 2008
URL http://www.infoandina.org/system/files/recursos/GBIF_TM1.pdf#
Description The Global Biodiversity Information Facility (GBIF) is a worldwide network that makes primary, scientific, biodiversity data (documented species occurrence data) from many sources openly available via the Internet. It does this by building an information infrastructure that interconnects hundreds of databases, and by promoting the digitisation and sharing of data that are not currently available via the Internet, such as those associated with specimens in natural history museums. This promotion of digitisation is approached in a number of ways: seed money awards to stimulate digitisation projects; the development (with partners) of community-accepted standards for data and metadata, as well as software tools that enable interconnectivity and interoperability; workshops for training in digitisation and data-sharing; and guides such as this training manual and its components. GBIF’s hope is to help collections and database personnel around the world share best practices in the tasks and operations required in building a web-based, global “natural history collection and herbarium” that can be accessed any time any where by any one via the Internet.

How to Digitize Large Insect Collections? Preliminary Results of the Dig Project, by D. Lampe, K-H. Striebing.

Pub Date
URL http://phthiraptera.info/Publications/9073.pdf#
Description In practice the overall efficiency of data-basing the inventory of traditional entomological collections depends on two factors: suitable software and management measures to ensure the highest possible data quality already in the input process. Lessons learned from the development of the specimen-based database BIODAT and preliminary results of the DIG-(Digitization of key Insect groups at ZFMK) project, which is especially designed to develop a 'good practice', recommend: (1) a lockstep programme for data-basing, (2) data entry of collection units & split record function, (3) visualisation of georeferenced location/sites during data entry, (4) semi-automatic/automatic data transformation from original format into additional alternative ones, (5) semiautomatic data transfer of taxa- and geo-referenced information units. Current activities deal with the introduction of semantic feedback mechanisms into the practice of data-basing entomological collections.

Image Capture and Processing: An Overview, Consortium of Pacific Northwest Herbaria, by Ben Legler.

Pub Date July 26, 2010
URL http://www.pnwherbaria.org/documentation/imaging-overview.pdf#
Description This document describes the general processes used for image capture, image processing, data capture, and data/image dissemination used by the Consortium of Pacific Northwest Herbaria, as carried out under the Consortium’s 2010-2013 collaborative NSF Grant (DBI0956414). Details are omitted in an attempt to provide an overall understanding of the process.

Imaging of Specimens: Issues to be Considered, by Larry Speers, Agriculture and Agri-Food Canada.

Pub Date
URL http://www.canadensys.net/wp-content/uploads/montreal-2009-imaging.pdf#
Description A short presentation on specimen vs. label imaging, with consideration of image type, format, storage, and work flow.

Initiating a Collection Digitisation Project, by C. K. Frazier, J. Wall, S. Grant.

Pub Date 2008, GBIF
URL http://www.gbif.org/orc/?doc_id=2176#
Description This document is designed to give the reader the confidence to get started and to make the right decisions when planning a natural history collection digitisation project. The authors have years of experience working with collections and they have instilled this expertise into this paper so one can more efficiently ask the right questions and make the appropriate plans prior to committing any resources to the task.

Innovative workflows for efficient data capture in an entomological collection: The MCZ Lepidoptera Rapid Data Capture Project, by P. Morris, R. Eastwood, L. Ford, et. al.

Pub Date
URL http://ecnweb.org/sites/default/files/12_Eastwood_2010.pdf#
Description Presentation made at Entomology 2010. Significant detail on the rapid data capture project at Museum of Comparative Zoology, including an efficient workflow.

Moving Theory into Practice: Digital Imaging Tutorial.

Pub Date
URL http://www.library.cornell.edu/preservation/tutorial/tutorial_English.pdf#
Description An excellent online tutorial about digital imaging, including basic terminology, selection, conversion, quality, and metadata.

Natural History Specimen Digitization : Challenges and Concerns, by A. Vollmar, J. A. Macklin, L. S. Ford., Biodiversity Informatics, 7, pp. 93-112.

Pub Date 2010
URL https://journals.ku.edu/index.php/jbi/article/viewFile/3992/3806#
Description A survey on the challenges and concerns involved with digitizing natural history specimens was circulated to curators, collections managers, and administrators in the natural history community in the Spring of 2009, with over 200 responses received. The overwhelming barrier to digitizing collections was a lack of funding or issues directly related to funding, leaving institutions mostly responsible for providing the necessary support. The uneven digitization landscape leads to a patchy accumulation of records at varying qualities, and based on different priorities, ultimately influencing the data's fitness for use. The survey results also indicated that although the kind of specimens found in collections and their storage can be quite variable, there are many similar challenges across disciplines when digitizing including imaging, automated text scanning and parsing, geo-referencing, etc. Thus, better communication between domains could foster knowledge on digitization leading to efficiencies that could be disseminated through documentation of best practices and training.

New York Botanical Garden Virtual Herbarium Best Practices Guide.

Pub Date
URL http://sciweb.nybg.org/Science2/hcol/mtsc/NYBG_Best_Practices.doc
Description The purpose of this guide is to lay out the governing principles and procedures that have evolved over the ten years of experience with the NYBG Virtual Herbarium. Hopefully this document will be useful in future years in explaining the rationale behind the approach taken and decisions made along the way, and may be useful to other institutions who are just now embarking on a Virtual Herbarium project, or searching for comparative or benchmark data.

Not another frickin' database!, by Michael Wall.

Pub Date 2010
URL http://ecnweb.org/sites/default/files/13_Wall_2010.pdf#
Description This document reports on the Entomology Collection Health Online database at San Diego Natural History Museum. ECHO is not a specimen database. Rather, ECHO allows users to search the collection of taxa, assess the collection health of those taxa, and examine large images of drawers containing taxa. All drawers and schmidt boxes in the collection are catalogued to the lowest determined taxonomic level. SDNHM contains 219 databased drawers. Data on curatorial health is available for approximately 42,100 specimens.

Principals of Data Quality, A. Chapman.

Pub Date 2005, GBIF
URL http://imsgbif.gbif.org/CMS_ORC/?doc_id=1229&download=1#
Description There are many data quality principles that apply when dealing with species data and especially with the spatial aspects of those data. These principles are involved at all stages of the data management process. A loss of data quality at any one of these stages reduces the applicability and uses to which the data can be adequately put. These include: Data capture and recording at the time of gathering, Data manipulation prior to digitisation (label preparation, copying of data to a ledger, etc.), Identification of the collection (specimen, observation) and its recording, Digitisation of the data, Documentation of the data (capturing and recording the metadata), Data storage and archiving, Data presentation and dissemination (paper and electronic publications, web-enabled databases, etc.), Using the data (analysis and manipulation). All these have an input into the final quality or “fitness for use” of the data and all apply to all aspects of the data – the taxonomic or nomenclatural portion of the data – the “what”, the spatial portion – the “where” and other data such as the “who” and the “when” (Berendsohn 1997). Before a detailed discussion on data quality and its application to species-occurrence data can take place, there are a number of concepts that need to be defined and described. These include the term data quality itself, the terms accuracy and precision that are often misapplied, and what we mean by primary species data and species-occurrence data.

Reference Model for an Open Archival Information System (OAIS), by the Space Data Systems, Consultative Committee.

Pub Date January 2002
URL http://public.ccsds.org/publications/archive/650x0b1.PDF#
Description This document is a technical recommendation for use in developing a broader consensus on what is required for an archive to provide permanent or indefinite long-term preservation of digital information. The recommendation establishes a common framework of terms and concepts which comprise an Open Archival Information System (OAIS). It allows existing and future archives to be more meaningfully compared and contrasted. It provides a basis for further standardization within an archival context and it should promote greater vendor awareness of, and support of, archival requirements. Through the process of normal evolution, it is expected that expansion, deletion, or modification of this document may occur. This Recommendation is therefore subject to CCSDS document management and change control procedures which are defined in Procedures Manual for the Consultative Committee for Space Data Systems.

Relational database design and implementation for biodiversity informatics, by P. J. Morris, PhyloInformatics 7: 1-66 - 2005.

Pub Date 2005
URL http://www.athro.com/general/Phyloinformatics_7_85x11.pdf#
Description The complexity of natural history collection information and similar information within the scope of biodiversity informatics poses significant challenges for effective long term stewardship of that information in electronic form. This paper discusses the principles of good relational database design, how to apply those principles in the practical implementation of databases, and examines how good database design is essential for long term stewardship of biodiversity information. Good design and implementation principles are illustrated with examples from the realm of biodiversity information, including an examination of the costs and benefits of different ways of storing hierarchical information in relational databases. This paper also discusses typical problems present in legacy data, how they are characteristic of efforts to handle complex information in simple databases, and methods for handling those data during data migration.

Report on trial of SatScan tray scanner system by SmartDrive Ltd., by V. Blagoderov, I. Kitching, T. Simonsen, V. Smith; Natural History Museum, London.

Pub Date March 8-April 9, 2010
URL http://vsmith.info/files/npre20104486-1.pdf#
Description Smartdrive Ltd. has developed a prototype imaging system, SatScan, that captures digitised images of large areas while keeping smaller objects in focus at very high resolution. The system was set up in the Sackler Biodiversity Imaging laboratory of Natural History Museum on March 8, 2010 for a one-month trial. A series of projects imaging parts of the entomological, botanical, and palaeoentomological collections were conducted to assess the system's utility for museum collection management and biodiversity research. The technical and practical limitations of the system were investigated as part of this process.The SatScan system facilitates the capturing of a very large number of good quality images in a very short time. Large parts of the NHM collection could be digitised in dorsal view extremely quickly. These images have a wide range of uses across research, collection management, and public engagement. Scalability of the system is limited by our desire to assign unique identifiers (a number and/or a barcode) to specimens, and the cropping of these images. Without these identifiers digitised images will have limited long term value. The assignment of specimen level identifiers is potentially labour intensive. Options for assigning identifiers were not investigated as part of this trail but include the use of physical labels on each specimen (with significant resource implications and a significant volunteer workforce) and the use of virtual identifiers (as a virtual layer over the image, perhaps automatically assigned, and possible coupled with physical labels attached to specimens as dictated by recuration activities). Intuitive software (with a web interface) needs to be developed to facilitate this process, including support for cropping of an image and the automatic assignment and printing of an identifier label. On-demand assignment of identifiers would allow us to prioritize the digitisation but it will represent a significant change to the way we curate our collections and would require sustained and ongoing support from Collection Management. Additional concerns relate to the amount of storage space required to manage images, connection with existing digital systems and the utility of dorsal images for certain parts of the collection. These problems need to be addressed as part of a larger scale study.

Scientific Collections: Mission-Critical Infrastructure for Federal Science Agencies, A Report of the Interagency Working Group on Scientific Collections.

Pub Date 2009
URL http://www.whitehouse.gov/sites/default/files/sci-collections-report-2009-rev2.pdf#
Description This report represents the first step in an ongoing process of identifying and characterizing scientific collections and determining their long-term stewardship needs. Robust interagency collaboration will remain vital as we develop a systematic approach to safeguarding these scientific treasures for generations of scientists. Also see: https://www.ida.org/upload/stpi/pdfs/ida-d-3694-final.pdf

Semi-automated workflows for acquiring specimen data from label images in herbarium collections, Taxon, Volume 59, Number 6, pp. 1830-1842.

Pub Date December 2010
URL http://www.ingentaconnect.com/content/iapt/tax/2010/00000059/00000006/art00014#
Description Computational workflow environments are an active area of computer science and informatics research; they promise to be effective for automating biological information processing for increasing research efficiency and impact. In this project, semi-automated data processing workflows were developed to test the efficiency of computerizing information contained in herbarium plant specimen labels. Our test sample consisted of Mexican and Central American plant specimens held in the University of michigan Herbarium (MICH). The initial data acquisition process consisted of two parts: (1) the capture of digital images of specimen labels and of full-specimen herbarium sheets, and (2) creation of a minimal field database, or "pre-catalog", of records that contain only information necessary to uniquely identify specimens. For entering "pre-catalog" data, two methods were tested: key-stroking the information (a) from the specimen labels directly, or (b) from digital images of specimen labels. In a second step, locality and latitude/longitude data fields were filled in if the values were present on the labels or images. If values were not available, geo-coordinates were assigned based on further analysis of the descriptive locality information on the label. Time and effort for the various steps were measured and recorded. Our analysis demonstrates a clear efficiency benefit of articulating a biological specimen data acquisition workflow into discrete steps, which in turn could be individually optimized. First, we separated the step of capturing data from the specimen from most keystroke data entry tasks. We did this by capturing a digital image of the specimen for the first step, and also by limiting initial key-stroking of data to create only a minimal "pre-catalog" database for the latter tasks. By doing this, specimen handling logistics were streamlined to minimize staff time and cost. Second, by then obtaining most of the specimen data from the label images, the more intellectually challenging task of label data interpretation could be moved electronically out of the herbarium to the location of more highly trained specialists for greater efficiency and accuracy. This project used experts in the plants' country of origin, Mexico, to verify localities, geography, and to derive geo-coordinates. Third, with careful choice of data fields for the "pre-catalog" database, specimen image files linked to the minimal tracking records could be sorted by collector and date of collection to minimize key-stroking of redundant data in a continuous series of labels, resulting in improved data entry efficiency and data quality.

Specimen Imaging Documentation: Consortium of Pacific Northwest Herbaria, Version 4.0.

Pub Date November 11, 2011
URL http://www.pnwherbaria.org/documentation/imaging-documentation-v4.pdf#
Description Detailed, step-by-step documentation for herbarium imaging/label capture from the Consortium of Pacific Northwest Herbaria.

The AIC Guide to Digital Photography and Conservation Documentation

Pub Date Second Edition, 2011
URL http://www.conservation-us.org/resources/our-publications/special-projects/the-aic-guide#.WWUCocaZPfA
Description AIC has published the long-awaited second edition of the AIC Guide to Digital Photography and Conservation Documentation. This book is a comprehensive guide to digital photographic equipment, software, and processing tailored to the needs of conservation professionals. Authors Franziska Frey, Dawn Heller, Dan Kushel, Timothy Vitale, Jeffrey Warda (editor), and Gawain Weaver have more than doubled the size of the first edition, which includes major extensions and updates to the text and is fully illustrated with over 120 color figures. This second edition also has a wraparound internal spiral binding, allowing the book to lay flat.

The use of specimen label images for efficient data acquisition in research collections cataloguing: Workflow, Ingio Granzow-de la Cerda, Juan Carols Gomez-Martinez, Jose Luis Garcia-Castillo.

Pub Date
URL http://tdwg2006.tdwg.org/fileadmin/2006meeting/slides/GranzowCerda_ImagesM%C3%A9xMichCatalog_abs0098.pdf#
Description A presentation about an NSF-BRC project to digitize specimen of Mexican plants at the University of Michigan Herbarium, including consideration of equipment, work flow, and databasing.
Last modified on 11 July 2017, at 12:55