iDigBio Informatics and Cyberinfrastructure Workshop

Building 105, University of Florida

(Noon, Wed. 28 March – 5pm, Fri. 30 March)

A focused workshop sponsored by iDigBio will be conducted to discuss, define, and distill the standards and cyberinfrastructure requirements for the scientific collections and biodiversity sciences.  An operational assumption is that the biodiversity informatics community has already made significant effort and headway in certain tools, practices, and standards, and what’s now crucial is to identify critical gaps in the infrastructure and training that can be a further focus of community activities.

This workshop will bring together experts in biodiversity informatics and related fields (e.g., library and information sciences).  Workshop participation has been determined by invitation only in order to maximize the best possible cross-section of experience while maintaining a smaller group size that is condusive to open discussion. It will not be possible in a 2.5 day workshop to cover every technology need, but several priorities have been identified through the recent ADBC PI Summit and other venues.  Areas of priority include, but are not limited to 1) harmonizing data and metadata requirements for digital collections 2) examining semantic and linked-data approaches to biodiversity data interoperability, and 3) developing training and testing resources for georeferencing, and 4) defining storage capabilities, interoperability, and computational requirements for the national resource.  Stakeholders include NSF funded initiatives (e.g., TCNs, VertNet, BiSciCol, Filtered Push, Specify, and iDigBio), standards organizations such as TDWG, GSC, Dublin Core, collections data managers, and library and information scientists. 

Discussion topics and breakout groups will focus on:

·         Minimal Information for Scientific Collections (MISC) standards.  A number of research domains in biology have identified minimum requirements for sharing data (MIAPA for phylogenetics, MIxS in genomics).  The biodiversity informatics community has a wealth of existing standards, which itself imposes a need to distill and harmonize across this landscape.  Defining a MISC will require expert assessment and selection of a superset or intersection of existing standards e.g., Darwin Core, Dublin Core, Audubon Core, GSC MIxS (minimal standards for genomics), EML (Ecological Markup Language).  In the context of biodiversity science drivers, are there identifiable gaps in our understanding and use of data and metadata specifications?  For example, proper context of occurrence, CollectionObjects, DigitalObjects is not fully resolved.

·         Biodiversity Semantics ontology.  Build on new and existing efforts to establish a community ontology relevant to collections digitization, e.g., the TDWG DarwinCore SW, linked data efforts, and the GSC.  This discussion will focus on due diligence:  Identifying needs, use cases, stakeholders, starting points, existing gaps, and relevant scope that can be extended further at a Biodiversity Semantics workshop (planned for May 2012 by the RCN4GSC) and possible iDigBio working group activities.  In this problem space, Darwin Core includes a class for ResourceRelationship.  What are community best practices for describing relationships?  Can we utilize outcomes of the MISC discussion and existing vocabulary terms to set the stage for further development?

·         Georeferencing scientific collections. Significant effort and progress within the biodiversity community has previously established guidelines and best practices for georeferencing specimens, along with necessary tools and software.  Current priorities are establishing an online test suite and training resources.  iDigBio can effectively contribute support for online resources and coordination of training activities.

·         Functional cyberinfrastructure requirements.  Establish an initial baseline of needs and requirements for reference implementation(s) of interoperable resources, cloud ecosystems, virtual appliances, and linked data efforts within the context of scientific collections.

Outcomes of the workshop are expected to include documents and recommendations for working groups that would continue activities identified as further priorities.  Where appropriate, breakout groups will generate functional requirements and use cases, including cost/time estimates where appropriate, for the identified workshop topics that can be translated into technical requirements for system design.  Agreed-upon standards will be documented and published to encourage interoperability with other organizations interested in publishing digitized data to iDigBio. Where possible, TCNs and other partners will be asked to provide a preliminary set of digitized data to test the mechanisms to share, store, and synchronize data and training materials. 

