Webinar: Towards user-definable, semi-automated workflows for curating biodiversity data

Add this event to your calendar:

Add to My Calendar

filtered push logoDid you miss the webinar? Or are there parts you'd like to hear again? UPDATE: Click to listen to the Kurator Webinar, recorded 28 May 2015.

In the FilteredPush project, we have developed automated workflows for quality control of biodiversity data, first as proof-of-concept desktop software in the Kepler Kuration package and then as analysis tools embedded in FilteredPush network nodes. These nodes produce data-quality reports for end users but can also be installed and run independently by end users on their data sets. The Kurator project builds on this foundation to give users tools they can use to assemble exactly the tests they want to perform on their data. The current roster of tools provides a small set of canned workflows with some configuration options.

In this webinar, we will demonstrate the existing set of Kurator tools for quality control of biodiversity data, assist participants in setting up this software to run on their own data sets, examine the data quality reports that the tools produce, and discuss the next steps in the Kurator project.

Currently, Kurator provides canned workflows with some options; these workflows target the data-quality needs of the TCNs that we support. The actors in the workflow check scientific names, georeferences, and collecting event dates in the input records for internal consistency and against external authoritative services. They also produce a data-quality report, which includes suggested modifications to the input data along with provenance data about data sources consulted by the workflow and assertions made by the workflow. A post-processing program takes this output and renders it in a spreadsheet, which by configuration can be aimed either at a scientific user of the data (focus on validated elements in the data set) or at a data curator (focus on potential problems found in the data, grouped into units of work).

TCNs using, or planning to use this quality control workflow include: SCAN TCN, NEVP TCN, and InvertEBase TCN

The Kurator project is working to generalize the FilteredPush Quality Control (QC) workflow approach to produce more flexible user-composable workflows by using state-of-the-art scientific workflow automation methodologies. While the current QC workflow can be thought of as an automated workflow, it has the external appearance to users as a monolithic unit of software with the behavior of a general Java application. In contrast, workflows built using the new Kurator framework will comprise separate, independent units of code that may be assembled by users without our help.

Presenters:  (alphabetical order) David Lowery, James A. Macklin, Timothy  McPhillips, Paul J. Morris, Robert A. Morris, Tianghong Song

Where:  http://idigbio.adobeconnect.com/datamgmt

Part 1 of Webinar (for everyone): Overview and discussion of Kurator

Part 2 of Webinar: designed for IT-oriented folks wanting to install and test please go here http://wiki.datakurator.org/wiki/iDigBioWebinar_May2015 Follow the instructions and you'll have some opportunities in the second half of the webinar to get input into use of this tool.

Find us on twitter: #datakurator #filteredpush

Brought to you by: the iDigBio Data Management Interest Group and the iDigBio Cyberinfrastructure Working Group

Start Date: 
Thursday, May 28, 2015 - 2:00pm to 4:00pm EDT
Location: 
Harvard University
City: 
Cambridge
State: 
Massachusetts
Remote Connection URL: 
http://idigbio.adobeconnect.com/datamgmt