Field to Database

From iDigBio
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Field to Database
Graphic1.png

Quick Links for Field to Database
Field to Database Workshop Agenda
Field to Database Workshop Biblio Entries
Field to Database Workshop Report (Workshop Blog)
This Wiki supports the short course - Field to Database: Biodiversity Informatics and Data Management Skills for Specimen Based Research. Where? The University of Florida at iDigBio from March 9 - 12, 2015. It is the third in a series of four biodiversity informatics workshops planned in collaboration with the Tri-Trophic Thematic Collection Network for iDigBio in the upcoming year (2014-2015). The fourth workshop in this series is Sept 15-16, 2015 and focuses on Data Management for Collection Managers.

Apply Now

Workshop is full. Application Form is closed.

General Information

This workshop's aim is to investigate current trends in collecting, and focus on best practices and skills development for supporting the collection and sharing of robust, fit-for-research-use data. This 4-day short course is designed to be hands-on and will mix lectures with field work and participant exercises and presentations.

Planning Team

Deb Paul (iDigBio), Katja Seltmann (TTD-TCN, AMNH), François Michonneau (FLMNH - iDigBio), Derek Masaki (USGS - BISON), Pam Soltis (FLMNH - iDigBio PI), Shari Ellis (iDigBio), Kevin Love (iDigBio)

About

Skill Level

Some exposure to R is required. This workshop expects you have some experience with R. If you are new-ish to R, we request you take an intro to R course before the workshop. There are several good options:

Try R (Code School course)
intro to R(Coursera course starts Feb 2nd).
Beginner Course: Up and Running with R with Barton Poulson (course at lynda.com)
Intermediate Course: R Statistics Essential Training with Barton Poulson(course at lynda.com)

Instructors: François Michonneau (FLMNH - iDigBio), Katja Seltmann (TTD-TCN, AMNH), Derek Masaki (USGS), Matt Collins (ACIS - iDigBio)
Assistants: Deborah Paul (FSU - iDigBio), Matt Cannister (USGS)
Who: The course is aimed at graduate students, postdocs, research staff, and other researchers.
Where: iDigBio in Gainesville, FL
Requirements:

Participants must bring a laptop with a few specific software packages installed.
Participants must have some knowledge of R. This is not a beginner-level course. There are introductions to R you can take on-your-own before the workshop.
If you will be traveling from out of town, you will need to make your own travel arrangements.

Contact: Please email Deb Paul, dpaul@fsu.edu for questions and information not covered here.
Twitter: #field2db @idigbio

Tuition for the course is free, but prior registration is required for attending. You can register here.

Software Installation Requirements

Software needed for Field to Database Course at iDigBio

Mac OS X
Text Editor
We recommend Text Wrangler. In a pinch, you can use nano, which should be pre-installed.
RStudio + R
Install R by downloading and running this .pkg file from CRAN. Also, please install the RStudio IDE.
Spreadsheet
If you already have a spreadsheet program installed, like LibreOffice, Excel or OpenOffice, you can use whatever you already have. If you don't have a spreadsheet program, please download and install LibreOffice from http://www.libreoffice.org/download/libreoffice-fresh/
PC
Text Editor
Notepad++ is a popular free code editor for Windows. Be aware that you must add its installation directory to your system path in order to launch it from the command line (or have other tools like Git launch it for you). The instructions to modify your path are available online here. Please ask your instructor to help you do this.
RStudio + R
Install R by downloading and running this .pkg file from CRAN. Also, please install the RStudio IDE.
Spreadsheet
If you already have a spreadsheet program installed, like LibreOffice, Excel or OpenOffice, you can use whatever you already have. If you don't have a spreadsheet program, please download and install LibreOffice from http://www.libreoffice.org/download/libreoffice-fresh/
Linux
Text Editor
Kate is one option for Linux users. In a pinch, you can use nano, which should be pre-installed.
RStudio + R
You can download the binary files for your distribution from CRAN. Or you can use your package manager, e.g. for Debian/Ubuntu run apt-get install r-base. Also, please install the RStudio IDE.
Spreadsheet
If you already have a spreadsheet program installed, like LibreOffice, Excel or OpenOffice, you can use whatever you already have. If you don't have a spreadsheet program, please download and install LibreOffice from http://www.libreoffice.org/download/libreoffice-fresh/
  • You must RSVP that the required software is installed, prior to the workshop. Instructors are available to help - see your email for their contact information.

We use Adobe Connect extensively in this workshop. Please perform the systems test using the link below. Also, you will also need to install the Adobe Connect Add-In to participate in the workshop.

Goals

  • Investigate, observe, discover leading-edge trends in field collecting.
  • Provide examples of best practices for data collecting and data sharing including such data as field data, identifiers, trait data, and environmental variables.
  • Explore data tools, to include software such as R, but also field apps.
  • Convey the concept of, importance, and methods for how to create reproducible research workflows.
  • Illustrate how data gets from the field into a collection database and into an aggregator's database.
  • Discuss how data gets published and discovered.

Objectives

  • Students participate in field collecting with subject-matter experts and present what changes they plan to make to their collecting practices in a workshop presentation.
  • Subject-matter experts share what they have learned from seeing / talking with others on this topic.
  • Students work through examples to demonstrate mastery of skills for transforming, enhancing, standardizing data.
  • Through comments, discussion, and perhaps post-workshop survey, students demonstrate they grasp the importance of metadata and understand the conceptual difference between data and metadata.
  • Students write a post-workshop blog post, prepare a report, or presentation, to synthesize what was learned and pay-it-forward.

Our curriculum overview

Day 1: Why a Field-to-Database Biodiversity Informatics Workshop? On Site Field Demos from Invited Experts from Paleontology, Ornithology, Ecology, Marine Science, Entomology, and Botany
Day 2: Student 3-minute presentations. General issues in field data collection to data synthesis. Getting started with R.
Day 3: Data exploration using R. Import and display. From raw data to technically correct data. From technically correct data to consistent data. File output. Writing processed data to file.
Day 4: Using R to access biodiversity APIs. Publishing data on iDigBio. Publishing data on DataDryad. Review, Wrap-up, Survey, Next Steps.

The concepts, skills, and tools we teach are domain-independent, but example problem cases and datasets will be taken from organismal and evolutionary biology, biodiversity science, ecology, and environmental science.

Updates to course Wiki will be posted to this website as they become available.

Workshop Evaluation

  • link to pre-workshop survey (if we do one)
  • Post Workshop Survey Results

Agenda

Course Overview - Day 1, Monday March 9th
Time Activity Responsible
800 - 830 Registration.
name tags, wired/wireless, adobeconnect, check-in.
All, Deb Paul (iDigBio)
830 - 850 Welcome and Introduction to iDigBio. (pptx)
Motivation = Research! (pptx)(pdf version)
Deb Paul (iDigBio) &
Pam Soltis (iDigBio PI)
850 - 910 Why a Field-to-Database Biodiversity Informatics Workshop? R_files_modeling Charlotte Germain-Aubrey (iDigBio Post Doc) and Katja Seltmann (TTD-TCN)
910 - 930 Let's go to the field! Where the best places are wet, isolated, and without internet. A story of the trials of typical fieldwork. Emilio Bruna
930 - 940 How to prioritize where you collect? How do you plan a collecting trip? What kind of resources do you bring in the field? Grant Godden
940 - 1000 Field templates, workflow, and planning ahead for better results. Andrew Short
1000 - 1010 Collecting RNA, DNA & flower color. Lessons from a recent field trip. Grant Godden
10:00-10:30 Break (remember Pascal's) tea and drip coffee free with your name tag, check in at the counter
1030 - 1110 Data and metadata standards for biodiversity media: the past, present and future. Mike Webster
1110 - 1130 Top 10 mobile applications every biologist should know about. Download and try. Emilio Bruna
11:30 - 12:00 Transport to Natural Teaching Area (vans)
12:00 - 1:00 Lunch (Brown Bag provided) (organizers set up demo areas)
1200 - 1230 Brown bag lunch discussion. Standards: Darwin Core and more. Emphasis of benefits of starting off using them right away. Presented in field using a handout and conversation regarding Darwin Core and other standards. Input from outside experts important for addressing sound/image/paleontological and ecological standards. Metadata.
Field Handout - 1) Summary of some relevant standards including: Darwin Core, Ecological Metadata Language (EML), Audubon Media Extension, Global Genome Biodiversity Network (GGBN) and 2) Best practices for writing a locality description.
Deb Paul
1230 - 100 Brown bag lunch discussion. Students try one of the cell phone or tablet applications presented by Emilio. Download a GPS app if you do not have one! Sharing is encouraged for students who do not have a mobile device. Everyone
100 - 330 Breakout Group 1: Activity (60min): Students are grouped into pairs or groups of three. Each team does two rounds of mini-collecting, 10 minutes each for total of 20 minutes. For the first 10 min: Each team has to collect and record data for a few insects they collect on blank paper (e.g. a journal page). For the second 10 minutes, each team repeats this process but now is given a generic data sheet to fill in. The collecting focus is insects on plants. Andrew Short & Grant Godden
130 - 330 Breakout Group 2: Activity (60min): Collecting media in the field. Audio and video recordings, as well as photographs, of animals in nature are increasingly becoming important sources of data for biodiversity studies, yet there are few standards for how these should be collected in the field, the sorts of metadata that should be included, and how to preserve and make them accessible to the research community. In this activity we will demonstrate and discuss basic techniques for collecting biodiversity media and metadata in the field, as well as techniques that are being developed to deposit those data quickly and easily in a secure archive. Mike Webster
245 - 315 Break between breakout group exercises Everyone
330 - 400 Group Photo! Travel back to Classroom and begin discussion and debriefing from Field experience. Discussions will run into the morning of day 2. Everyone
400 - 430 Review of field apps with students. Which worked and which didn’t? How would students imagine applying these applications in the field. Emilio Bruna
430 - 500 Recap and homework (videos) for tomorrow, and further presentations and discussion. Katja Seltmann
6:00 Dinner on your own. Potential to have dinners together if desired.
Course Overview - Day 2, Tuesday March 10th
8:30-9:00 Check in, answer questions All, Deb Paul
900 - 940 Fossil field collection and field site 3D reconstruction including present paleo databases and standards. Justin Woods
940 - 1000 Efficient workflow from collection to cataloging for marine invertebrates. François Michonneau
1000 - 1020 Discussion of template field exercise. Andrew Short & Grant Godden
1020 - 1100 General Discussion: General issues in field data collection to data synthesis. Describe common problems with field data sources and impacts of these problems. All, Katja Seltmann
1100-1120 Break All
1120-1200 Reproducible Research Derek
12:30-1:30 Lunch (on your own)
1:30-5:00 Getting started with R François Michonneau (Lead)
5:00-5:30 Review / Homework? / Preview of tomorrow
Course Overview - Day 3, Wednesday March 11th
8:30-9:00 Check in, answer questions All, Deb Paul
9:00-9:20 Review of new specimen data set for today's R lesson: identify issues, errors.
iDigBio R for Data Processing
unzip this to open HTML file of iDigBio R for Data Processing Lesson Steps
Derek Masaki (Lead)
9:20-10:20 Data exploration using R. Import and display. Derek Masaki (Lead)
10:20-12:30 From raw data to technically correct data.
From technically correct data to consistent data.
Derek Masaki (Lead)
12:30-1:30 Lunch on your own
1:00-1:45 File output. Writing processed data to file. Derek Masaki (Lead), François Michonneau
1:45-2:45 Review. Work-on-your-own data set.
2:45-4:45 Intro to R Markdown OR Break Outs François Michonneau
5:00-5:30 Review / Wrap-up / Preview of tomorrow
Course Overview - Day 4, Thursday March 12th
8:30-9:00 Check in, answer questions All, Deb Paul
9:00-12:00 Using R to access biodiversity APIs
  • 9:00-9:30 Explanation of API & packages (Matt)

Media:2015-03-12-F2DB-Apis.pdf

  • 9:30-10:00 Installation of packages in R including installing needed packages (Francois)
  • 10:00-10:20 Break
  • 10:20-12:00 Working with APIs using packages (Matt)

Media:2015-03-12-F2DB-R_pkg_lesson.pdf R Script for lesson

Francois Michonneau, Matt Collins (Leads)
12:00-1:00 Lunch on your own
1:00-1:45 Getting your data out there: publishing & standards with iDigBio Molly Phillips, Matt Collins (Leads)
2:30-4:00 Publishing data on Dryad Todd Vision, Dryad (http://datadryad.org) (Lead)
4:00-5:00 Review, Wrap-up, Survey, Next Steps. 1 slide lightning talks by participants
Optional Evening Session -- on working with their own data?

Future plans: Scaling it up: Demo using the iPlant Discovery Environment (DE)

Link to Workshop Report

Logistics

Adobe Connect Access

Adobe Connect will be used to provide communication between all present at the workshop.
Remote participants will be able to listen to lecture portions only.

We use Adobe Connect extensively in this workshop. Please perform the systems test using the link below. Also, you will also need to install the Adobe Connect Add-In to participate in the workshop.

Presentation Documents and Links

More Field to Database Workflows

Leading-Edge and Trends in Collecting Methods
People from across the planet joined in to our call to send in more examples of how data gets from the field, into a database.

  1. Biocode Field Information Management System ppt youtube. A Field Information Management System (FIMS) enables data collection at the source (in the field) by generating spreadsheet templates, validating data, and assigning persistent identifiers for every unique biological sample. The following diagram shows how the system works. The most typical functions are Generating Templates and Validating Data, both of which can be found under the Tools menu.
    Generate a Template
    Validate data
    How FIMS works
  2. Field Host Collecting Workflow with Arthropod Easy Capture (mp4). A highly efficient and field tested workflow for recording and databasing insects and host plants developed by Randall Schuh (and others) during the Plant Bug Planetary Biodiversity Project. Record collecting events information in great detail, including images and host plant material. The AEC database is open-source, easily installed, and submits data to iDigBio and Discover Life.
  3. Field to Freezer: Low tech collecting; high quality data. Shelley James, Herbarium Pacificum, Bishop Museum
  4. From the Field Into Specify: several options. (mp4) Andrew Bentley, Specify, University of Kansas Biodiversity Institute
    Installation Package for Specify
  5. From the Field Into Symbiota
Part 1: Field Reach perspective (time: 10:20) – Show how a field research can enter a voucher specimen along with a field image, link voucher to a checklist, and print labels to be distributed with the specimen vouchers.
Part 2: Curator’s perspective (time: 11:05) – Shows how a curator can import a record from the collector’s data set to their own collection rather than retyping the label data from scratch. Also includes how identification annotations can filter down the network of specimen duplicates to correct a misidentification within the original checklist.
  1. Digitally archiving localities through the use of their coordinates. Amy Smith, Collections Manager of Earth Sciences, Perot Museum of Nature and Science
    Digitally visualizing and archiving coordinates using KML files
    PDF to accompany video
  2. Filling Biodiversity Knowledge Gaps (GBIF video) Dr Arturo Ariño discusses potential information gaps that exist between different sources of data, using two case studies the UN Biosphere Reserves in Mexico and Spain.
  3. ABC Taxa: the Journal Dedicated to Capacity Building in Taxonomy and Collection Management.
  1. Volume 8 - Manual on Field Recording Techniques and Protocols for All Taxa Biodiversity Inventories. (2010) Jutta Eymann, Jérôme Degreef, Christoph Häuser, Juan Carlos Monje, Yves Samyn & Didier VandenSpiegel Eds. Field recording techniques in ABC Taxa; beyond traditional collecting and preserving of organismal life (including soil sampling) it includes camera trapping and bio-acoustics as well.

Biodiversity APIs

taxize tutorial
taxize on github
ridigbio
Open Tree of Life APIs
Introduction to the VertNet API
rgbif on github
rgbif tutorial
rgbif: Interface to the Global Biodiversity Information Facility API

Useful Links and Materials

Workshop Recordings

Day 1

Day2

Day3

Day4

  • 9:00-12:00
  • 1:00-2:30
  • 2:30-4:00
  • 4:00-5:00

Related Workshop Resources and Links

Links from You

Related Blog Posts and Photos

Digitization Training Workshops Wiki Home