Object To Image To Data Workshop
Introduction
A Workshop co-sponsored by iDigBio and the Scientific Software Innovation Institutes (S2I2) will address the issues of digitization workflows, digitization bottlenecks, and digitization technology needs. Effective and accurate identification of technology needs is best derived by first understanding and optimizing workflows (which are often taxon-specific) and then identifying and minimizing bottlenecks within those workflows. The Workshop will focus on the process of converting an object (e.g., a specimen) to an image, and then to data. The planning committee also recognizes that some institutions may follow a workflow that proceeds from object to data to image; while this approach is not preferred, it will be addressed to some degree during the Workshop.
This Wiki will be utilized as a collaboration tool during the planning, execution, and post-Workshop documentation phases of the Workshop.
Key Dates
- 2/9/2012: The S2I2 Workshop proposal was reviewed and approved by the iDigBio Steering Committee
- 2/2012: Agenda planning was initiated; Gil Nelson, Deb Paul and Amanda Neill met to discuss Workshop objectives and topics
- 3/7/2012: A Workshop planning meeting was held and tasks were assigned. Attendees: Amanda Neill, Christopher Norris, Greg Riccardi, Jim Beach, Gil Nelson, Deb Paul, Jason Grabon
- 3/21/2012: The next scheduled meeting of the planning committee
Decision Log
- The Workshop will be conducted from May 30th 8:00 AM - May 31st 5:00 PM. Attendees will arrive on May 29th.
Task List
Due Date | Responsible | Task | Status |
---|---|---|---|
3/9/2012 | All (Planning Committee) | Provide Jason Grabon with recommendations for core Workshop attendees, including the organism group represented by each potential participant | Incomplete |
3/12/2012 | All (Planning Committee) | Jason Grabon will consolidate the list of recommended attendees and re-distribute to the planning committee for final consensus of core invitees | Incomplete |
3/16/2012 | Jason Grabon | Contact core attendees to determine additional suggestions for key participants | Incomplete |
3/16/2012 | Jim Beach | Define Workshop participant roles (pre-Workshop roles, roles to be filled during the Workshop, and post-Workshop roles) | Incomplete |
3/16/2012 | Amanda Neill | Provide the planning committee with a draft of the Workshop agenda | Incomplete |
3/16/2012 | Jason Grabon | Identify a workflow management expert (manufacturing, business processes, etc) at the University of Florida who will be willing to participate and contribute to the Workshop | Incomplete |
3/21/2012 | All (Planning Committee) | *Finalize the list of pre-selected Workshop invitees.
|
Incomplete |
3/26/2012 | Jason Grabon | Send invitations to Workshop participants | Incomplete |
3/26/2012 | Jason Grabon | Initiate the open registration process (4/6/2012 application deadline) | Incomplete |
3/26/2012 | Gil Nelson, Deb Paul, Amanda Neill | Provide the planning committee with a draft of the pre-Workshop survey | Incomplete |
3/30/2012 | All (Planning Committee) | Finalize the pre-Workshop survey | Incomplete |
Week of 4/2/2012 | Shari Ellis | Distribute the pre-Workshop survey to confirmed Workshop participants | Incomplete |
4/6/2012 | All (Planning Committee) | Close the open registration process and select participants from applicant pool | Incomplete |
Workshop Proposal
Co-sponsored iDigBio-S2I2 workshop proposal - Object to Image to Data: Documentation of Digitization Workflow Models
Proposed workshop dates: May 30th 8:00 AM - May 31st 5:00 PM
S2I2 Leads: Chris Norris & Jim Beach
S2I2 Chair/Organizer: Amanda Neill
iDigBio Leads: Jose Fortes & Greg Riccardi
The digitization of information about the distribution of life on earth extends scientific work and workflows which began with the discovery of species in the field and laboratory, systematic identification and description, and the deposition of specimen vouchers in biological repositories. The goal of digitization is to add additional value to society’s monumental 400-year investment in collecting and curating samples of earth’s biological diversity, by mobilizing the data associated with those specimens to the internet. The process of converting text and image information to digital formats should be the easy part of attaining this societal goal given the extraordinary level of standardization associated with specimen creation and curation across countries and centuries. Tasks associated with the creation and optimization of information acquisition from biological specimens lend themselves to innovative and efficient technological approaches, and to efficient optimization with the variety of computational, network, and logistics tools and services available in both academia and through commercial and crowd-sourced services.
We propose a workshop to address the documentation and analysis of digitization workflows for biological specimens, with a primary focus on addressing those preservation types for which digital images can become the basis for further digitization steps. The goals of the meeting are to: (1) inform and train workshop participants in the use of lightweight business process modeling, which will then be used by participants to (2) create and document reference workflow models for represented disciplines and/or preservation types. The justification for describing digitization workflows by discipline and/or preparation type is that those classes define discrete sets of logistical and data-acquisition parameters based on the physical properties and legacy curatorial practice of each discipline/preparation type. Data from labels of snakes in jars, from minute labels of pinned insects, or from herbarium specimens each present unique challenges and opportunities for digitization.
The ultimate goal of producing reference workflow models for specimen digitization is to then employ them in:
- The evaluation of newly-proposed workflows, as tests and extensions of the reference model. Made available on the web, these models would be usable by current and newly-created digitization efforts to evaluate their own approaches against a model at some level of maturity for completeness, outputs, efficiency, cost, staffing, technology application, and throughput.
- Identifying gaps within respective workflows for technology tools and services that would complete them and should be prioritized as required technology for development, funding, and ongoing technical support.
- Identify options for individual workflow tasks for optimization in various economic and technical support scenarios. Crowdsourcing might be the only viable solution for certain tasks given insufficient funding for keystroke data entry. Partial records might be the only attainable outcome for disciplines constrained by massive specimen overloads or unwieldy source material.
These workshop activities should meet the goals of iDigBio by encouraging communication and collaboration among domain experts who will document technology needs and requirements via workflow modeling exercises and group exploration of innovative solutions to the object-image-data bottleneck. Workshop funds will not be used in a session to plan grants, or to develop grant proposals. Workshop deliverables will feed into the various software innovation programs being developed in the context of NSF’s Cyberinfrastructure Framework for 21st Century Science and Engineering.
Workshop Outcomes:
- A community-developed collections workflow site showing tasks that diverge among disciplines/preparation types and those that are in common across multiple workflows (it is hoped that iDigBio would moderate and host this).
- A call to action to discipline-based communities to identify resources and tools to complete missing areas of the workflows.
- A model for group assessment of differing technological solutions applied to a workflow steps.
- A model for evaluating the economic efficiency of individual steps, their location and precedence in the workflow, and the economic (including logistic, HR, collaborative goal) efficiency of the overall workflow.
- A possible, scalable, affordable, consensus technology solution for label capture technology applicable to multiple collection types (or a record of such a vision for future application).
- Publication of these findings in a publication(s) in Collections Forum, PLoS ONE, or appropriate society/discipline journals.