Digitization Workflow Workshop Report

Wed, 2012-06-06 12:05 -- jgrabon

Developing Robust Object-to-Image-to-Data (DROID) Workflow Workshop

30-31st May 2012, Florida Museum of Natural History, University of Florida (FLMNH)

Biological specimens document the historical and modern occurrence of plant and animal species - and most of what we know about the diversity and distribution of life on earth. The majority of collected specimens have yet to be digitized, but at the same time, current biodiversity digitization processes and technologies are often inefficient and uncoordinated, preventing timely and cost-effective digitization of these specimens. This research workshop focused on the design, documentation, and optimization of workflows necessary to transform physical specimens collected in the field into useful, shareable, and manageable digital objects within a collection. Approximately twenty hands-on collections experts provided input during the workshop.

Why document workflows?

Workflow documentation is a powerful tool both within a collection and across the entire collections community. Internally, effective workflow documentation for a collection can highlight inefficiencies, identify bottlenecks that hinder throughput, and expose opportunities for automation. Workflow documentation also serves as initial input into the development of collections digitization training materials and checklists that improve quality and consistency. Collectively, the documentation and sharing of effective digitization workflows 1) enables collections to test and compare results in order to identify optimal processes, 2) prevents collections from investing resources in (re)designing a process that already exists within the community, 3) enhances communication and standardization by enabling agreement on a common workflow vocabulary for each task , and 4) exposes new innovations to the entire community. Additionally, comprehensive workflow documentation enables the natural history collections community to approach digitization and technology innovators from other domains, such as library sciences, robotics development, industrial workflow design, or software development, for assistance. This includes the ability to present documented workflows to collaborators to learn about improved methods as well as innovative or re-purposed tools.

But we are unique!

The workshop participants recognized that various factors impact the design of appropriate workflows for a particular collection.

  • Tradeoffs must be determined at a high level (e.g., volume of objects digitized to text vs. completeness of each record). These decisions may be dependent upon grant requirements or other externally imposed requirements.
  • Local decisions and policies may impact a digitization workflow, including institutional or collection policies.
  • Specific workflow decisions within a collection will be based upon constraints such as the quantity of personnel, available expertise, available funds, physical layout of the collection space, the method of specimen preservation, and other factors.

To overcome these issues, the DROID workshop participants produced two recommendations. The first was to approach the challenge by developing workflows specific to three broad preservation types, including 1) objects on flat sheets (typically plant specimens), 2) objects on pins (primarily insects), and 3) larger three-dimensional objects (fossils, mammals, reptiles, etc). Each high-level preservation type has enough similarity that workflows can be developed that have a reasonable number of common tasks. Participants then divided into groups, each focused on the requirements for a specific type.

A second recommendation was to develop more generalized, flexible workflows, with common tasks grouped into "modules" that could be inserted, removed or re-ordered within a collection's workflow based upon the factors described above. Workshop participants were quickly able to begin identifying “plug-and-play” modules, as well as tasks within those modules. These modules include items that may not be identified when a digitization project is first initiated but later prove necessary to the workflow, such as project management tasks, pre-digitization curation, or determination of what degree of imaging is most appropriate (e.g., all specimens, exemplars, or no specimen images). To that end, the group also discussed specific tactics for documenting workflows, methods for measuring activities to determine the optimal workflow, and continuous quality improvement activities that should periodically occur during the execution of digitization activity. The social and psychological aspects of organizational change, employee and volunteer motivation, and process re-design were also discussed.

Pulling it all together

Moving forward, a Working Group of participants will have a modular workflow for "objects on flat sheets" drafted by the end of July. This draft workflow will be open to the community for comments to help improve and refine the documentation. A "final" version will ultimately be posted on the iDigBio website, however, optimized workflows and workflow modules will be subject to continuous improvement as new insights and tools are applied. Following workflow design and documentation for "objects on flat sheets", the group will draft workflows for additional preservation types that will also be posted, released for comment, and then published as a living document.

Ultimately, the organizers and participants expect to produce digitization workflows that help the community in three ways:

  • Avoid Duplication of Effort - prevent others from "reinventing the wheel"
  • Take Small Steps - produce incremental improvements in digitization speed and quality
  • Take Giant Leaps - produce documentation that sparks and spreads innovation

The DROID workshop was organized by Integrated Digitized Biocollections (iDigBio), a National Resource Center at the University of Florida and Florida State University, in collaboration with the Botanical Research Institute of Texas, Yale University, and the University of Kansas. The workshop is supported by the U.S. National Science Foundation’s Office of Cyberinfrastructure and Directorate for Biological Sciences, through the Scientific Software Innovation Institutes (S2I2) and Advancing Digitization of Biodiversity Collections (ADBC) Programs.

We would like to thank all who contributed to the preparation of the workshop, who attended, and who continue the important work of finalizing the effort that was catalyzed by the workshop.

The agenda, workshop recordings, and presentations are available via links beside the DROID Workshop section of the iDigBio Workshop Summary Wiki Page.