Workshop Report: Developing an ontology for insect natural history data

The “natural history” of an organism refers to what that organism does – how it develops, behaves, and interacts with other organisms.  Natural history information is fundamental to multiple fields of study in biology, but the natural history of many organisms remains poorly known.  This is especially true for insects, the most diverse animals on our planet.  So, given any insect species, we might ask: What do we know about its natural history?  How do we know it?  The answers to these questions come from observation of insects, either in the field or under laboratory conditions.  Data from such observations are typically widely scattered, difficult to discover and analyze, and highly heterogeneous.  Aggregating these data so that they can be analyzed and disseminated efficiently, without information loss, is a major informatics challenge.

At this three-day workshop, we took the first steps toward developing a robust ontology for insect natural history data and natural history observations.  We primarily focused on information from insect specimen labels, but we also recognized considerable overlap in content between insect natural history data from labels and insect natural history data from literature sources, which means that much of our work will be easily adaptable to literature-sourced data.  We anticipate that our ontology will provide computable semantics for insect natural history data and observations, which will, in turn, facilitate rich, automated, and reproducible data integration and aggregation.  During the workshop, we focused our efforts on five major tasks: assembly of example data, analysis of example data and ontology scoping, high-level ontology design and concept identification, identifying users and use cases, and authoring competency questions.  In addition to these main workshop activities, we also listened to short presentations from several workshop participants about related ontologies and software for ontology design and data modeling.

We spent several hours during the first day of the workshop assembling example natural history data for each of the five major insect orders (Coleoptera, Diptera, Hemiptera, Hymenoptera, and Lepidoptera).  For this task, we focused on information from insect specimen labels, but we also included some literature-based data.  Our goal was to cover, as thoroughly as possible, the various kinds of information recorded on specimen labels for each order, with an emphasis on capturing the breadth of biological information and observational detail.  For this task, we split into five small groups, with each group working on one insect order.

We continued working in small groups to analyze the kinds of information contained in the example data.  Here the goal was to begin delineating the scope of the ontology.  For each order, we summarized the kinds of biological information that were observed (e.g., various multi-organism interactions, developmental data) and the ways in which the information was recorded (e.g., qualitative or quantitative).  Then, we reconvened as a large group, each small group reported their findings, and we synthesized the results to construct a set of the high-level conceptual areas required for the final ontology.

We then split into two groups to sketch out basic ontology design patterns for the highest priority conceptual areas and to identify the entities (concepts) to include in these conceptual areas.  We determined that observations and observing processes, relationships and interactions, and positional (spatial) information were the most critical conceptual areas to start with.  After working in separate groups for most of Thursday afternoon and part of Friday morning, each group reported their results and we discussed some critical ontology design challenges related to modeling observations.

The last major activity of the workshop was drafting detailed ontology competency questions and identifying potential users and user cases.  Competency questions provide a means for testing an ontology – they specify how an ontology will be used to answer queries about real data.  Thus, writing competency questions goes hand-in-hand with determining an ontology's users and use cases.  We initially worked in three groups, with each group working independently to develop competency questions and use cases.  By the end of this exercise, we had a large set of competency questions covering a wide range of application domains, including basic biological research (e.g., evolutionary biology and ecology) and applied fields (e.g., conservation biology and agriculture).

Over three days of hard work, we laid a solid foundation for further ontology design, development, and implementation efforts.  We plan to use the momentum generated by this workshop to help us reach the longer-term goal of a complete draft ontology implementation.