Difference between revisions of "Field to Database"

From iDigBio
Jump to: navigation, search
 
(68 intermediate revisions by 11 users not shown)
Line 2: Line 2:
 
! colspan="2" style="background:#D58B28;width:200px;font-size:10pt" |Field to Database
 
! colspan="2" style="background:#D58B28;width:200px;font-size:10pt" |Field to Database
 
|-
 
|-
| colspan="2" style="text-align:center;font-size:7pt" |[[File:graphic1.png|center|500px|]]<br />
+
| colspan="2" style="text-align:center;font-size:7pt" |[[File:graphic1.png|center|400px|]]<br />
 
|-
 
|-
 
!colspan="2" style="background:#D58B28;text-align:center;font-size:9pt" | Quick Links for Field to Database
 
!colspan="2" style="background:#D58B28;text-align:center;font-size:9pt" | Quick Links for Field to Database
Line 8: Line 8:
 
|[https://www.idigbio.org/wiki/index.php/Field_to_Database#Agenda Field to Database Workshop Agenda]
 
|[https://www.idigbio.org/wiki/index.php/Field_to_Database#Agenda Field to Database Workshop Agenda]
 
|-  
 
|-  
|Field to Database Workshop Biblio Entries
+
|[https://www.idigbio.org/biblio?f%5bkeyword%5d=460 Field to Database Workshop Biblio Entries]
 
|-  
 
|-  
|Field to Database Workshop Report (Workshop Blog)
+
|[https://www.idigbio.org/content/rmarkdown-github-reproducible-research Field to Database Workshop Report (Workshop Blog)]
 
|}
 
|}
 
[[Category:Workshop]][[Category: Data carpentry]][[Category: Biodiversity informatics]]
 
[[Category:Workshop]][[Category: Data carpentry]][[Category: Biodiversity informatics]]
<div>This wiki supports the short course - Field to Database: Biodiversity Informatics and Data Management Skills for Specimen Based Research. Where? The University of Florida at iDigBio from March 9 - 12, 2015. It is the third in a series of four biodiversity informatics workshops planned in collaboration with the [http://tcn.amnh.org/ Tri-Trophic Thematic Collection Network] for iDigBio in the upcoming year (2014-2015). The fourth workshop in this series is Sept 15-16, 2015 and focuses on Data Management for Collection Managers.</div>
+
<div>This Wiki supports the short course - Field to Database: Biodiversity Informatics and Data Management Skills for Specimen Based Research. Where? The University of Florida at iDigBio from March 9 - 12, 2015. It is the third in a series of four biodiversity informatics workshops planned in collaboration with the [http://tcn.amnh.org/ Tri-Trophic Thematic Collection Network] for iDigBio in the upcoming year (2014-2015). The fourth workshop in this series is Sept 15-16, 2015 and focuses on [https://www.idigbio.org/wiki/index.php/Managing_Natural_History_Collections_Data_for_Global_Discoverability Data Management for Collection Managers].</div>
  
 
== Apply Now ==
 
== Apply Now ==
Line 101: Line 101:
 
The concepts, skills, and tools we teach are domain-independent, but example problem cases and datasets will be taken from organismal and evolutionary biology, biodiversity science, ecology, and environmental science.
 
The concepts, skills, and tools we teach are domain-independent, but example problem cases and datasets will be taken from organismal and evolutionary biology, biodiversity science, ecology, and environmental science.
  
Updates to course wiki will be posted to this website as they become available.
+
Updates to course Wiki will be posted to this website as they become available.
  
 
===Workshop Evaluation===
 
===Workshop Evaluation===
* link to pre-workshop survey (if we do one)
+
* Our pre-workshop survey simply asked participants to rank their R skills. With 19 respondents, our participants formed a heterogenous group:
* Post Workshop Survey Results
+
** 6 chose "Low. I am a total beginner, have no or little experience, or have only gone through the R tutorial."
 +
** 5 chose "Somewhat low. I have used R, but only under the guidance of someone more expert (e.g., during a course or workshop)."
 +
** 5 chose "Neither high nor low. I can use and adapt scripts written by other people."
 +
** 3 chose "Somewhat high. I can write my own scripts."
 +
* [[Media:F2dbfinalsurvey.pdf|Post Workshop Survey Results for Field to Database]]
  
 
==Agenda==
 
==Agenda==
Line 124: Line 128:
 
|-
 
|-
 
|830 - 850
 
|830 - 850
|[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/IntroAndLogisticsF2DBiDigBio.pptx Welcome and Introduction to iDigBio.] (pptx)<br/>[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/F2DB%20PSoltis.pptx Motivation = Research!] (pptx)([https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/F2DB%20PSoltis.pdf pdf version])
+
|[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/IntroAndLogisticsF2DBiDigBio.pptx Welcome and Introduction to iDigBio.] (pptx)<br/>[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/F2DB%20PSoltis.pptx Motivation = Research! (pptx)][https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/F2DB%20PSoltis.pdf (pdf)]
 
|Deb Paul (iDigBio) &<br/> Pam Soltis (iDigBio PI)
 
|Deb Paul (iDigBio) &<br/> Pam Soltis (iDigBio PI)
 
|-
 
|-
 
|850 - 910
 
|850 - 910
|Why a Field-to-Database Biodiversity Informatics Workshop?
+
|[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/DataCarpentryCharlotte.pptx Why a Field-to-Database Biodiversity Informatics Workshop? (pptx)]([[Media:DataCarpentryCharlotte.pdf|pdf]])<br/>[https://www.dropbox.com/sh/crmaz7smc3w8qmf/AACe83xbxgDY_i6NisH7vVXma?dl=0 R_files_modeling]
 
|Charlotte Germain-Aubrey (iDigBio Post Doc) and Katja Seltmann (TTD-TCN)
 
|Charlotte Germain-Aubrey (iDigBio Post Doc) and Katja Seltmann (TTD-TCN)
 
|-
 
|-
Line 136: Line 140:
 
|-
 
|-
 
|930 - 940
 
|930 - 940
|How to prioritize where you collect? How do you plan a collecting trip? What kind of resources do you bring in the field?
+
|[[Media:IDigBio_FieldworkPlanning_F2DB.pdf|Using Digital Resources to Plan Field Expeditions]]<br/>How to prioritize where you collect? How do you plan a collecting trip? What kind of resources do you bring in the field?
 
|Grant Godden
 
|Grant Godden
 
|-
 
|-
 
|940 - 1000
 
|940 - 1000
|Field templates, workflow, and planning ahead for better results.
+
|[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/iDigBioWorkshopTalk_SHORT.pdf Tips and Workflows for Managing Field Data]<br/>Field templates, workflow, and planning ahead for better results.
 
|Andrew Short
 
|Andrew Short
 
|-
 
|-
 
|1000 - 1010
 
|1000 - 1010
|Collecting RNA, DNA & flower color. Lessons from a recent field trip.
+
|[[Media:IDigBio_GenomicResources_F2DB.pdf| Standards for Collection of Genomic Resources]]<br/>Collecting RNA, DNA & flower color. Lessons from a recent field trip.
 
|Grant Godden
 
|Grant Godden
 
|-
 
|-
Line 156: Line 160:
 
|-
 
|-
 
|1110 - 1130
 
|1110 - 1130
|Top 10 mobile applications every biologist should know about. Download and try.
+
|Top 10 mobile applications every biologist should know about. Download and try. Here are some.
 +
*Compass [https://itunes.apple.com/us/app/commander-compass-lite/id340268949?mt=8 Commander Compass Lite]
 +
*Random Number Generator: [https://play.google.com/store/apps/details?id=com.brandao.randomnumbergenerator&hl=en Generate Random Numbers App]
 +
*Range Finder/Height Measurements: [https://play.google.com/store/apps/details?id=kr.sira.measure&feature=search_result#?t=W251bGwsMSwyLDEsImtyLnNpcmEubWVhc3VyZSJd Smart Measure]
 +
*Epicollect: for Dataforms http://www.epicollect.net/<br/>
 +
*Sound Recording:
 +
**Free: [https://itunes.apple.com/us/app/audionote-lite-notepad-voice/id379301403?mt=8 AudioNote Lite - Notepad and Voice Recorder]
 +
**Pay: [https://itunes.apple.com/us/app/irecorder-pro-audio-recorder/id285750155?mt=8 iRecorder Pro - Audio Recorder] (Pay) Or
 +
***[https://itunes.apple.com/us/app/voice-recorder-hd/id373045717?mt=8 Voice Recorder HD for Audio Recording, Playback, Trimming and Sharing]
 +
*OPTIONAL BUT USEFUL FOR MANY
 +
**Light: [https://itunes.apple.com/us/app/geotag-photos-lite/id374252911?mt=8 Geotag Photos Lite]
 +
***PAY: Geotagging Photos: [https://itunes.apple.com/us/app/geotag-photos-pro/id355503746 Geotag Photos Pro]
 
|Emilio Bruna
 
|Emilio Bruna
 
|-
 
|-
Line 168: Line 183:
 
|-
 
|-
 
|1200 - 1230
 
|1200 - 1230
|Brown bag lunch discussion. Standards: Darwin Core and more. Emphasis of benefits of starting off using them right away. Presented in field using a handout and conversation regarding Darwin Core and other standards. Input from outside experts important for addressing sound/image/paleontological and ecological standards. Metadata.
+
|Brown bag lunch discussion. Standards: Darwin Core and more. Emphasis of benefits of starting off using them right away. Presented in field using a handout and conversation regarding Darwin Core and other standards. Input from outside experts important for addressing sound/image/paleontological and ecological standards. Metadata.<br />[http://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/DarwinCoreStandardsHandoutV2.docx Field Handout] - 1) Summary of some relevant standards including: Darwin Core, Ecological Metadata Language (EML), Audubon Media Extension, Global Genome Biodiversity Network (GGBN) and 2) Best practices for writing a locality description.
 
|Deb Paul
 
|Deb Paul
 
|-
 
|-
Line 176: Line 191:
 
|-
 
|-
 
|100 - 330
 
|100 - 330
|'''Breakout Group 1''': Activity (60min): Students are grouped into pairs or groups of three. Each team does two rounds of mini-collecting, 10 minutes each for total of 20 minutes. For the first 10 min: Each team has to collect and record data for a few insects they collect on blank paper (e.g. a journal page). For the second 10 minutes, each team repeats this process but now is given a generic data sheet to fill in. The collecting focus is insects on plants.
+
|'''Breakout Group 1''': Activity (60min): Students are grouped into pairs or groups of three. Each team does two rounds of mini-collecting, 10 minutes each for total of 20 minutes. For the first 10 min: Each team has to collect and record data for a few insects they collect on blank paper (e.g. a journal page). For the second 10 minutes, each team repeats this process but now is given a generic data sheet to fill in. The collecting focus is insects on plants.<br/>
 +
[[Media:FLworkshopDataSheet.pdf|Sample Field Data Collection Sheet]]<br/>
 +
[[Media:FLworkshopFieldLabels.pdf|Sample Field Labels]]
 
|Andrew Short & Grant Godden
 
|Andrew Short & Grant Godden
 
|-
 
|-
Line 210: Line 227:
 
|-
 
|-
 
|900 - 940
 
|900 - 940
| Fossil field collection and field site 3D reconstruction including present paleo databases and standards.
+
|[[Media:JustinWoods-iDigBio-March2015.pdf|Fossil field collection and field site 3D reconstruction including present paleo databases and standards.]]<br/>[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/Excavation%20Instructional%20iDigBio.mp4 Excavation Instructional Video]
 
| Justin Woods
 
| Justin Woods
 
|-
 
|-
 
|940 - 1000
 
|940 - 1000
| Efficient workflow from collection to cataloging for marine invertebrates.
+
|[[Media:Field-methods.pdf|Efficient workflow from collection to cataloging for marine invertebrates.]]
 
| François Michonneau
 
| François Michonneau
 
|-
 
|-
Line 238: Line 255:
 
|-
 
|-
 
|1:30-5:00
 
|1:30-5:00
| Getting started with R
+
|[http://idigbio.github.io/2015-03-09-workshop-field2db/intro-R.html Getting started with R]
 
| François Michonneau (Lead)
 
| François Michonneau (Lead)
 
|-
 
|-
Line 252: Line 269:
 
|-
 
|-
 
|9:00-9:20
 
|9:00-9:20
|Review of new specimen data set for today's R lesson: identify issues, errors.
+
|Review of new specimen data set for today's R lesson: identify issues, errors.<br />[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/iDigBio_Data_Processing.docx iDigBio R for Data Processing]<br />[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/iDigBio_DataProcessing_Presentation.zip unzip this to open HTML file of iDigBio R for Data Processing Lesson Steps]
 
|Derek Masaki (Lead)
 
|Derek Masaki (Lead)
 
|-
 
|-
Line 291: Line 308:
 
|9:00-12:00
 
|9:00-12:00
 
|Using R to access biodiversity APIs
 
|Using R to access biodiversity APIs
*9-930 Explanation of API & packages (Matt)
+
*9:00-9:30 Explanation of API & packages (Matt)<br/>
*930-1000 Installation of packages in R including installing needed packages (Francois)
+
[[Media:2015-03-12-F2DB-Apis.pdf|Introduction to Web APIs]]
*1000-1020 Break
+
*9:30-10:00 Installation of packages in R including installing needed packages (Francois)
*1020-1200 Working with APIs using packages (Matt)
+
*10:00-10:20 Break
 +
*10:20-12:00 Working with APIs using packages (Matt)<br/>
 +
[[Media:2015-03-12-F2DB-R_pkg_lesson.pdf|Using APIs in R]]<br/>
 +
[https://raw.githubusercontent.com/iDigBio/2015-03-09-workshop-field2db/gh-pages/r_pkg_lesson.R R Script for lesson]
 
|Francois Michonneau, Matt Collins (Leads)
 
|Francois Michonneau, Matt Collins (Leads)
 
|-
 
|-
Line 302: Line 322:
 
|-
 
|-
 
|1:00-1:45
 
|1:00-1:45
|Publishing data on iDigBio
+
|[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/MPStandards%2CPublishing%26Ingesting_d2.pptx Getting your data out there: publishing & standards with iDigBio]
 
|Molly Phillips, Matt Collins (Leads)
 
|Molly Phillips, Matt Collins (Leads)
 
|-
 
|-
 
|2:30-4:00
 
|2:30-4:00
|Publishing data on DataDryad (includes discussion of metadata)
+
|[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/vision_field2db_march2015.pdf  Publishing data on Dryad]
|Todd Vision, DataDryad (Lead)
+
|Todd Vision, Dryad (http://datadryad.org) (Lead)
 
|-
 
|-
 
|4:00-5:00
 
|4:00-5:00
Line 344: Line 364:
 
:#[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/Collecting.pptx Field to Freezer: Low tech collecting; high quality data.] Shelley James, Herbarium Pacificum, Bishop Museum
 
:#[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/Collecting.pptx Field to Freezer: Low tech collecting; high quality data.] Shelley James, Herbarium Pacificum, Bishop Museum
 
:#[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/Specify%20for%20field%20data.mp4 From the Field Into Specify: several options.] (mp4) Andrew Bentley, Specify, University of Kansas Biodiversity Institute<br/>[https://www.idigbio.org/wiki/index.php/Specify_6_Appliance_Download_and_Installation Installation Package for Specify]
 
:#[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/Specify%20for%20field%20data.mp4 From the Field Into Specify: several options.] (mp4) Andrew Bentley, Specify, University of Kansas Biodiversity Institute<br/>[https://www.idigbio.org/wiki/index.php/Specify_6_Appliance_Download_and_Installation Installation Package for Specify]
:#From the Field Into Symbiota
+
:#From the Field Into Symbiota<br/>Part 1: [http://idigbio.adobeconnect.com/p9hep7duj96/ Field Reach perspective] (time: 10:20) – Show how a field research can enter a voucher specimen along with a field image, link voucher to a checklist, and print labels to be distributed with the specimen vouchers.<br/>Part 2: [http://idigbio.adobeconnect.com/p68flaqagcg/ Curator’s perspective] (time: 11:05) – Shows how a curator can import a record from the collector’s data set to their own collection rather than retyping the label data from scratch. Also includes how identification annotations can filter down the network of specimen duplicates to correct a misidentification within the original checklist.
::Part 1: [http://idigbio.adobeconnect.com/p9hep7duj96/ Field Reach perspective] (time: 10:20) – Show how a field research can enter a voucher specimen along with a field image, link voucher to a checklist, and print labels to be distributed with the specimen vouchers.
+
::Part 2: [http://idigbio.adobeconnect.com/p68flaqagcg/ Curator’s perspective] (time: 11:05) – Shows how a curator can import a record from the collector’s data set to their own collection rather than retyping the label data from scratch. Also includes how identification annotations can filter down the network of specimen duplicates to correct a misidentification within the original checklist.
+
 
:#Digitally archiving localities through the use of their coordinates. Amy Smith, Collections Manager of Earth Sciences, Perot Museum of Nature and Science<br/>[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/ACS_GoogleMaps.avi Digitally visualizing and archiving coordinates using KML files]<br/>[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/ACS_GoogleMaps.pdf PDF to accompany video]
 
:#Digitally archiving localities through the use of their coordinates. Amy Smith, Collections Manager of Earth Sciences, Perot Museum of Nature and Science<br/>[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/ACS_GoogleMaps.avi Digitally visualizing and archiving coordinates using KML files]<br/>[https://www.idigbio.org/sites/default/files/workshop-presentations/field-to-database/ACS_GoogleMaps.pdf PDF to accompany video]
 
:#[https://vimeo.com/107473692 Filling Biodiversity Knowledge Gaps] (GBIF video) Dr Arturo Ariño discusses potential information gaps that exist between different sources of data, using two case studies the UN Biosphere Reserves in Mexico and Spain.
 
:#[https://vimeo.com/107473692 Filling Biodiversity Knowledge Gaps] (GBIF video) Dr Arturo Ariño discusses potential information gaps that exist between different sources of data, using two case studies the UN Biosphere Reserves in Mexico and Spain.
Line 355: Line 373:
 
:[http://ropensci.org/tutorials/taxize_tutorial.html taxize tutorial]
 
:[http://ropensci.org/tutorials/taxize_tutorial.html taxize tutorial]
 
:[https://github.com/ropensci/taxize taxize on github]
 
:[https://github.com/ropensci/taxize taxize on github]
:[https://github.com/fmichonneau/ridigbio ridigbio]
+
:[https://github.com/idigbio/ridigbio ridigbio]
 
:[https://github.com/OpenTreeOfLife/opentree/wiki/Open-Tree-of-Life-APIs Open Tree of Life APIs]
 
:[https://github.com/OpenTreeOfLife/opentree/wiki/Open-Tree-of-Life-APIs Open Tree of Life APIs]
 
:[https://github.com/VertNet/webapp/wiki/Introduction-to-the-VertNet-API Introduction to the VertNet API]
 
:[https://github.com/VertNet/webapp/wiki/Introduction-to-the-VertNet-API Introduction to the VertNet API]
Line 374: Line 392:
 
*[https://google-styleguide.googlecode.com/svn/trunk/Rguide.xml Google's R Style Guide]
 
*[https://google-styleguide.googlecode.com/svn/trunk/Rguide.xml Google's R Style Guide]
 
**make code easier for you, and others, to understand
 
**make code easier for you, and others, to understand
 +
*[http://rseek.org/ Rseek]
 +
** a search platform just for R resources
 +
*[http://www.rdocumentation.org/ R documentation] An extensive search interface for R packages
 +
*[http://www.rstudio.com/products/rstudio/download/ Download R Studio, newest version for Mac's with new operating systems]
 +
*[http://swcarpentry.github.io/git-novice/ The Software Carpentry Git lesson]
 +
*[http://try.github.io A web-based tutorial to Git]
 +
*[http://markdowntutorial.com/ A web-based tutorial to Markdown]
 +
*[http://rmarkdown.rstudio.com/ An introduction to RMarkdown with RStudio]
 +
*[https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN Using version control (Git) with RStudio]
 +
** Here's a download page for RStudio, INCLUDING the newest version for Mac OSX Yosemite (Mac OS X 10.6+ (64-bit)) (which has had some incompatibilities with certain packages)
 +
* Data publishing links (from Todd Vision)
 +
** The [http://service.re3data.org/search re3data registry] of repositories and the [http://biosharing.org BioSharing] registry of policies, databases and standards
 +
** [http://datadryad.org/pages/jdap Listing] of journals that have adopted the Joint Data Archiving Policy or a similar policy
 +
** [https://dmptool.org/dm_guidance Guidance] on creating a Data Management Plan for NSF or other agencies and an [https://dmptool.org/plans/8332.pdf example plan]
 +
** [https://www.dataone.org/education-modules DataONE educational modules]
 +
** Best practices in preparing data for archiving
 +
*** Borer, ET et al. (2009) [http://www.esajournals.org/doi/full/10.1890/0012-9623-90.2.205 Some simple guidelines for effective data management]. Bulletin of the ESA 48, 205-214
 +
*** Hook, LA et al. (2010) [http://daac.ornl.gov/PI/BestPractices-2010.pdf Best Practices for Preparing Environmental Data Sets to Share and Archive]
 +
*** Penev L et al. (2011) [http://www.pensoft.net/J_FILES/Pensoft_Data_Publishing_Policies_and_Guidelines.pdf  Pensoft Data Publishing Policies and Guidelines for Biodiversity Data]
 +
*** Whitlock MC (2010) [http://dx.doi.org/10.1016/j.tree.2010.11.006 Data archiving in ecology and evolution: best practices] Trends in Ecology & Evolution 26, 61-65.
 +
** Dryad [http://datadryad.org/pages/faq FAQ] and [http://wiki.datadryad.org/Data_Access API] (some functionality, but still under development)
 +
** Examples of articles w/ data packages in Dryad
 +
*** from [http://bdj.pensoft.net/articles.php?id=1071 Biodiversity Data Journal]
 +
*** from [http://www.nature.com/articles/sdata201425 NPG Scientific Data]
 +
** Other examples
 +
*** [http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0058978 Manatee paper] - not meant as an examplar of data availability!
 +
*** [http://datadryad.org/resource/doi:10.5061/dryad.b003f Heliconia data package in Dryad]
  
 
==Workshop Recordings==
 
==Workshop Recordings==
 
====Day 1====
 
====Day 1====
*9:00am-10:00am  
+
*9:00am-10:00am http://idigbio.adobeconnect.com/p7nh5z5qljf/
*10:15am-11:30pm
+
*10:15am-11:30pm http://idigbio.adobeconnect.com/p60a38m2qhy/
*4:00pm-5:00pm
+
*4:00pm-5:00pm http://idigbio.adobeconnect.com/p88qtlumpjb/
  
 
====Day2 ====
 
====Day2 ====
*9:00am-11:00am  
+
*9:00am-11:00am http://idigbio.adobeconnect.com/p2p6ezjdwdo/
*11:00am-11:30pm
+
*11:30am-12:30pm http://idigbio.adobeconnect.com/p7woy8hro5x/
*11:30pm-12:30pm
+
*1:30pm-2:30pm http://idigbio.adobeconnect.com/p96rvexycsl/
*1:30pm-1:45pm
+
*1:45-5:00pm http://idigbio.adobeconnect.com/p5l1dc47t1p/
*1:45-5:00pm
+
*5:00-5:30pm
+
  
 
====Day3====
 
====Day3====
*9:00-10:00
+
*9:00-12:30 http://idigbio.adobeconnect.com/p21s71147nh/
*10:00-12:00
+
*1:30-3:30 http://idigbio.adobeconnect.com/p6ipnr4eh4v/
*1:00-2:00
+
*3:45-5 http://idigbio.adobeconnect.com/p8fha4j15ex/
*2:00-3:00
+
  
 
====Day4====
 
====Day4====
*9:00-12:00
+
*9:00-10:30 http://idigbio.adobeconnect.com/p9b106642l6/
*1:00-2:30
+
*11:15-12:15 http://idigbio.adobeconnect.com/p285w4uu5xr/
*2:30-4:00
+
*1:30-2:30 http://idigbio.adobeconnect.com/p30irmqksq8/
*4:00-5:00
+
*2:30-5:00 http://idigbio.adobeconnect.com/p7kabi2d68f/
  
 
==Related Workshop Resources and Links==
 
==Related Workshop Resources and Links==
Line 424: Line 466:
 
*[http://datascienceatthecommandline.com/ Data Science at the Command Line]
 
*[http://datascienceatthecommandline.com/ Data Science at the Command Line]
 
*[http://www.it.ufl.edu/training/ Free Training Resources for UF students, faculty, and staff] UF provides free access to over 2600 online training courses through Lynda.com.  Does your institution have similar free training opportunities?
 
*[http://www.it.ufl.edu/training/ Free Training Resources for UF students, faculty, and staff] UF provides free access to over 2600 online training courses through Lynda.com.  Does your institution have similar free training opportunities?
 +
*Very easy-to-use map making service: [http://cartodb.com/ cartodb.com]
 +
 
===Related Blog Posts and Photos===
 
===Related Blog Posts and Photos===
 
*[http://nescent.github.io/2014-05-08-datacarpentry/  Inaugural Data Carpentry Workshop] by Tracy K. Teal
 
*[http://nescent.github.io/2014-05-08-datacarpentry/  Inaugural Data Carpentry Workshop] by Tracy K. Teal

Latest revision as of 10:14, 12 October 2015

Field to Database
Graphic1.png

Quick Links for Field to Database
Field to Database Workshop Agenda
Field to Database Workshop Biblio Entries
Field to Database Workshop Report (Workshop Blog)
This Wiki supports the short course - Field to Database: Biodiversity Informatics and Data Management Skills for Specimen Based Research. Where? The University of Florida at iDigBio from March 9 - 12, 2015. It is the third in a series of four biodiversity informatics workshops planned in collaboration with the Tri-Trophic Thematic Collection Network for iDigBio in the upcoming year (2014-2015). The fourth workshop in this series is Sept 15-16, 2015 and focuses on Data Management for Collection Managers.

Apply Now

Workshop is full. Application Form is closed.

General Information

This workshop's aim is to investigate current trends in collecting, and focus on best practices and skills development for supporting the collection and sharing of robust, fit-for-research-use data. This 4-day short course is designed to be hands-on and will mix lectures with field work and participant exercises and presentations.

Planning Team

Deb Paul (iDigBio), Katja Seltmann (TTD-TCN, AMNH), François Michonneau (FLMNH - iDigBio), Derek Masaki (USGS - BISON), Pam Soltis (FLMNH - iDigBio PI), Shari Ellis (iDigBio), Kevin Love (iDigBio)

About

Skill Level

Some exposure to R is required. This workshop expects you have some experience with R. If you are new-ish to R, we request you take an intro to R course before the workshop. There are several good options:

Try R (Code School course)
intro to R(Coursera course starts Feb 2nd).
Beginner Course: Up and Running with R with Barton Poulson (course at lynda.com)
Intermediate Course: R Statistics Essential Training with Barton Poulson(course at lynda.com)

Instructors: François Michonneau (FLMNH - iDigBio), Katja Seltmann (TTD-TCN, AMNH), Derek Masaki (USGS), Matt Collins (ACIS - iDigBio)
Assistants: Deborah Paul (FSU - iDigBio), Matt Cannister (USGS)
Who: The course is aimed at graduate students, postdocs, research staff, and other researchers.
Where: iDigBio in Gainesville, FL
Requirements:

Participants must bring a laptop with a few specific software packages installed.
Participants must have some knowledge of R. This is not a beginner-level course. There are introductions to R you can take on-your-own before the workshop.
If you will be traveling from out of town, you will need to make your own travel arrangements.

Contact: Please email Deb Paul, dpaul@fsu.edu for questions and information not covered here.
Twitter: #field2db @idigbio

Tuition for the course is free, but prior registration is required for attending. You can register here.

Software Installation Requirements

Software needed for Field to Database Course at iDigBio

Mac OS X
Text Editor
We recommend Text Wrangler. In a pinch, you can use nano, which should be pre-installed.
RStudio + R
Install R by downloading and running this .pkg file from CRAN. Also, please install the RStudio IDE.
Spreadsheet
If you already have a spreadsheet program installed, like LibreOffice, Excel or OpenOffice, you can use whatever you already have. If you don't have a spreadsheet program, please download and install LibreOffice from http://www.libreoffice.org/download/libreoffice-fresh/
PC
Text Editor
Notepad++ is a popular free code editor for Windows. Be aware that you must add its installation directory to your system path in order to launch it from the command line (or have other tools like Git launch it for you). The instructions to modify your path are available online here. Please ask your instructor to help you do this.
RStudio + R
Install R by downloading and running this .pkg file from CRAN. Also, please install the RStudio IDE.
Spreadsheet
If you already have a spreadsheet program installed, like LibreOffice, Excel or OpenOffice, you can use whatever you already have. If you don't have a spreadsheet program, please download and install LibreOffice from http://www.libreoffice.org/download/libreoffice-fresh/
Linux
Text Editor
Kate is one option for Linux users. In a pinch, you can use nano, which should be pre-installed.
RStudio + R
You can download the binary files for your distribution from CRAN. Or you can use your package manager, e.g. for Debian/Ubuntu run apt-get install r-base. Also, please install the RStudio IDE.
Spreadsheet
If you already have a spreadsheet program installed, like LibreOffice, Excel or OpenOffice, you can use whatever you already have. If you don't have a spreadsheet program, please download and install LibreOffice from http://www.libreoffice.org/download/libreoffice-fresh/
  • You must RSVP that the required software is installed, prior to the workshop. Instructors are available to help - see your email for their contact information.

We use Adobe Connect extensively in this workshop. Please perform the systems test using the link below. Also, you will also need to install the Adobe Connect Add-In to participate in the workshop.

Goals

  • Investigate, observe, discover leading-edge trends in field collecting.
  • Provide examples of best practices for data collecting and data sharing including such data as field data, identifiers, trait data, and environmental variables.
  • Explore data tools, to include software such as R, but also field apps.
  • Convey the concept of, importance, and methods for how to create reproducible research workflows.
  • Illustrate how data gets from the field into a collection database and into an aggregator's database.
  • Discuss how data gets published and discovered.

Objectives

  • Students participate in field collecting with subject-matter experts and present what changes they plan to make to their collecting practices in a workshop presentation.
  • Subject-matter experts share what they have learned from seeing / talking with others on this topic.
  • Students work through examples to demonstrate mastery of skills for transforming, enhancing, standardizing data.
  • Through comments, discussion, and perhaps post-workshop survey, students demonstrate they grasp the importance of metadata and understand the conceptual difference between data and metadata.
  • Students write a post-workshop blog post, prepare a report, or presentation, to synthesize what was learned and pay-it-forward.

Our curriculum overview

Day 1: Why a Field-to-Database Biodiversity Informatics Workshop? On Site Field Demos from Invited Experts from Paleontology, Ornithology, Ecology, Marine Science, Entomology, and Botany
Day 2: Student 3-minute presentations. General issues in field data collection to data synthesis. Getting started with R.
Day 3: Data exploration using R. Import and display. From raw data to technically correct data. From technically correct data to consistent data. File output. Writing processed data to file.
Day 4: Using R to access biodiversity APIs. Publishing data on iDigBio. Publishing data on DataDryad. Review, Wrap-up, Survey, Next Steps.

The concepts, skills, and tools we teach are domain-independent, but example problem cases and datasets will be taken from organismal and evolutionary biology, biodiversity science, ecology, and environmental science.

Updates to course Wiki will be posted to this website as they become available.

Workshop Evaluation

  • Our pre-workshop survey simply asked participants to rank their R skills. With 19 respondents, our participants formed a heterogenous group:
    • 6 chose "Low. I am a total beginner, have no or little experience, or have only gone through the R tutorial."
    • 5 chose "Somewhat low. I have used R, but only under the guidance of someone more expert (e.g., during a course or workshop)."
    • 5 chose "Neither high nor low. I can use and adapt scripts written by other people."
    • 3 chose "Somewhat high. I can write my own scripts."
  • Post Workshop Survey Results for Field to Database

Agenda

Course Overview - Day 1, Monday March 9th
Time Activity Responsible
800 - 830 Registration.
name tags, wired/wireless, adobeconnect, check-in.
All, Deb Paul (iDigBio)
830 - 850 Welcome and Introduction to iDigBio. (pptx)
Motivation = Research! (pptx)(pdf)
Deb Paul (iDigBio) &
Pam Soltis (iDigBio PI)
850 - 910 Why a Field-to-Database Biodiversity Informatics Workshop? (pptx)(pdf)
R_files_modeling
Charlotte Germain-Aubrey (iDigBio Post Doc) and Katja Seltmann (TTD-TCN)
910 - 930 Let's go to the field! Where the best places are wet, isolated, and without internet. A story of the trials of typical fieldwork. Emilio Bruna
930 - 940 Using Digital Resources to Plan Field Expeditions
How to prioritize where you collect? How do you plan a collecting trip? What kind of resources do you bring in the field?
Grant Godden
940 - 1000 Tips and Workflows for Managing Field Data
Field templates, workflow, and planning ahead for better results.
Andrew Short
1000 - 1010 Standards for Collection of Genomic Resources
Collecting RNA, DNA & flower color. Lessons from a recent field trip.
Grant Godden
10:00-10:30 Break (remember Pascal's) tea and drip coffee free with your name tag, check in at the counter
1030 - 1110 Data and metadata standards for biodiversity media: the past, present and future. Mike Webster
1110 - 1130 Top 10 mobile applications every biologist should know about. Download and try. Here are some. Emilio Bruna
11:30 - 12:00 Transport to Natural Teaching Area (vans)
12:00 - 1:00 Lunch (Brown Bag provided) (organizers set up demo areas)
1200 - 1230 Brown bag lunch discussion. Standards: Darwin Core and more. Emphasis of benefits of starting off using them right away. Presented in field using a handout and conversation regarding Darwin Core and other standards. Input from outside experts important for addressing sound/image/paleontological and ecological standards. Metadata.
Field Handout - 1) Summary of some relevant standards including: Darwin Core, Ecological Metadata Language (EML), Audubon Media Extension, Global Genome Biodiversity Network (GGBN) and 2) Best practices for writing a locality description.
Deb Paul
1230 - 100 Brown bag lunch discussion. Students try one of the cell phone or tablet applications presented by Emilio. Download a GPS app if you do not have one! Sharing is encouraged for students who do not have a mobile device. Everyone
100 - 330 Breakout Group 1: Activity (60min): Students are grouped into pairs or groups of three. Each team does two rounds of mini-collecting, 10 minutes each for total of 20 minutes. For the first 10 min: Each team has to collect and record data for a few insects they collect on blank paper (e.g. a journal page). For the second 10 minutes, each team repeats this process but now is given a generic data sheet to fill in. The collecting focus is insects on plants.

Sample Field Data Collection Sheet
Sample Field Labels

Andrew Short & Grant Godden
130 - 330 Breakout Group 2: Activity (60min): Collecting media in the field. Audio and video recordings, as well as photographs, of animals in nature are increasingly becoming important sources of data for biodiversity studies, yet there are few standards for how these should be collected in the field, the sorts of metadata that should be included, and how to preserve and make them accessible to the research community. In this activity we will demonstrate and discuss basic techniques for collecting biodiversity media and metadata in the field, as well as techniques that are being developed to deposit those data quickly and easily in a secure archive. Mike Webster
245 - 315 Break between breakout group exercises Everyone
330 - 400 Group Photo! Travel back to Classroom and begin discussion and debriefing from Field experience. Discussions will run into the morning of day 2. Everyone
400 - 430 Review of field apps with students. Which worked and which didn’t? How would students imagine applying these applications in the field. Emilio Bruna
430 - 500 Recap and homework (videos) for tomorrow, and further presentations and discussion. Katja Seltmann
6:00 Dinner on your own. Potential to have dinners together if desired.
Course Overview - Day 2, Tuesday March 10th
8:30-9:00 Check in, answer questions All, Deb Paul
900 - 940 Fossil field collection and field site 3D reconstruction including present paleo databases and standards.
Excavation Instructional Video
Justin Woods
940 - 1000 Efficient workflow from collection to cataloging for marine invertebrates. François Michonneau
1000 - 1020 Discussion of template field exercise. Andrew Short & Grant Godden
1020 - 1100 General Discussion: General issues in field data collection to data synthesis. Describe common problems with field data sources and impacts of these problems. All, Katja Seltmann
1100-1120 Break All
1120-1200 Reproducible Research Derek
12:30-1:30 Lunch (on your own)
1:30-5:00 Getting started with R François Michonneau (Lead)
5:00-5:30 Review / Homework? / Preview of tomorrow
Course Overview - Day 3, Wednesday March 11th
8:30-9:00 Check in, answer questions All, Deb Paul
9:00-9:20 Review of new specimen data set for today's R lesson: identify issues, errors.
iDigBio R for Data Processing
unzip this to open HTML file of iDigBio R for Data Processing Lesson Steps
Derek Masaki (Lead)
9:20-10:20 Data exploration using R. Import and display. Derek Masaki (Lead)
10:20-12:30 From raw data to technically correct data.
From technically correct data to consistent data.
Derek Masaki (Lead)
12:30-1:30 Lunch on your own
1:00-1:45 File output. Writing processed data to file. Derek Masaki (Lead), François Michonneau
1:45-2:45 Review. Work-on-your-own data set.
2:45-4:45 Intro to R Markdown OR Break Outs François Michonneau
5:00-5:30 Review / Wrap-up / Preview of tomorrow
Course Overview - Day 4, Thursday March 12th
8:30-9:00 Check in, answer questions All, Deb Paul
9:00-12:00 Using R to access biodiversity APIs
  • 9:00-9:30 Explanation of API & packages (Matt)

Introduction to Web APIs

  • 9:30-10:00 Installation of packages in R including installing needed packages (Francois)
  • 10:00-10:20 Break
  • 10:20-12:00 Working with APIs using packages (Matt)

Using APIs in R
R Script for lesson

Francois Michonneau, Matt Collins (Leads)
12:00-1:00 Lunch on your own
1:00-1:45 Getting your data out there: publishing & standards with iDigBio Molly Phillips, Matt Collins (Leads)
2:30-4:00 Publishing data on Dryad Todd Vision, Dryad (http://datadryad.org) (Lead)
4:00-5:00 Review, Wrap-up, Survey, Next Steps. 1 slide lightning talks by participants
Optional Evening Session -- on working with their own data?

Future plans: Scaling it up: Demo using the iPlant Discovery Environment (DE)

Link to Workshop Report

Logistics

Adobe Connect Access

Adobe Connect will be used to provide communication between all present at the workshop.
Remote participants will be able to listen to lecture portions only.

We use Adobe Connect extensively in this workshop. Please perform the systems test using the link below. Also, you will also need to install the Adobe Connect Add-In to participate in the workshop.

Presentation Documents and Links

More Field to Database Workflows

Leading-Edge and Trends in Collecting Methods
People from across the planet joined in to our call to send in more examples of how data gets from the field, into a database.

  1. Biocode Field Information Management System ppt youtube. A Field Information Management System (FIMS) enables data collection at the source (in the field) by generating spreadsheet templates, validating data, and assigning persistent identifiers for every unique biological sample. The following diagram shows how the system works. The most typical functions are Generating Templates and Validating Data, both of which can be found under the Tools menu.
    Generate a Template
    Validate data
    How FIMS works
  2. Field Host Collecting Workflow with Arthropod Easy Capture (mp4). A highly efficient and field tested workflow for recording and databasing insects and host plants developed by Randall Schuh (and others) during the Plant Bug Planetary Biodiversity Project. Record collecting events information in great detail, including images and host plant material. The AEC database is open-source, easily installed, and submits data to iDigBio and Discover Life.
  3. Field to Freezer: Low tech collecting; high quality data. Shelley James, Herbarium Pacificum, Bishop Museum
  4. From the Field Into Specify: several options. (mp4) Andrew Bentley, Specify, University of Kansas Biodiversity Institute
    Installation Package for Specify
  5. From the Field Into Symbiota
    Part 1: Field Reach perspective (time: 10:20) – Show how a field research can enter a voucher specimen along with a field image, link voucher to a checklist, and print labels to be distributed with the specimen vouchers.
    Part 2: Curator’s perspective (time: 11:05) – Shows how a curator can import a record from the collector’s data set to their own collection rather than retyping the label data from scratch. Also includes how identification annotations can filter down the network of specimen duplicates to correct a misidentification within the original checklist.
  6. Digitally archiving localities through the use of their coordinates. Amy Smith, Collections Manager of Earth Sciences, Perot Museum of Nature and Science
    Digitally visualizing and archiving coordinates using KML files
    PDF to accompany video
  7. Filling Biodiversity Knowledge Gaps (GBIF video) Dr Arturo Ariño discusses potential information gaps that exist between different sources of data, using two case studies the UN Biosphere Reserves in Mexico and Spain.
  8. ABC Taxa: the Journal Dedicated to Capacity Building in Taxonomy and Collection Management.
  1. Volume 8 - Manual on Field Recording Techniques and Protocols for All Taxa Biodiversity Inventories. (2010) Jutta Eymann, Jérôme Degreef, Christoph Häuser, Juan Carlos Monje, Yves Samyn & Didier VandenSpiegel Eds. Field recording techniques in ABC Taxa; beyond traditional collecting and preserving of organismal life (including soil sampling) it includes camera trapping and bio-acoustics as well.

Biodiversity APIs

taxize tutorial
taxize on github
ridigbio
Open Tree of Life APIs
Introduction to the VertNet API
rgbif on github
rgbif tutorial
rgbif: Interface to the Global Biodiversity Information Facility API

Useful Links and Materials

Workshop Recordings

Day 1

Day2

Day3

Day4

Related Workshop Resources and Links

Links from You

Related Blog Posts and Photos

Digitization Training Workshops Wiki Home