Georeferencing for Research Use

iDigBio - CCBER GWG Georeferencing for Research Use, a short course

Georeferencing for Research Use, a short course

Quick Links for GWG Second Train the Trainers Workshop
Georeferencing for Research Use - link to agenda
Biblio entries
Georeferencing for Research Use, short course report
 
hotel and NCEAS map

October 4 - 7, 2016 at (https://www.nceas.ucsb.edu/) NCEAS, Santa Barbara California

We welcome you to this short course, with a focus on research use of georeferenced natural history collections data. We will include activities and discussions about best practices and tools for georeferencing, capturing locality data in the field, and using georeferenced specimen locality data in research. Attendees must have a basic level of experience with georeferencing techniques and tools and be researchers or directly involved with researchers.

After the workshop, we will encourage our participants to share use cases, any training materials developed, and to offer workshops, webinars, talks, or other events aimed at increasing use of best practices for georeferencing legacy locality data, best practices for capturing the locality data from future biological and paleontological collecting and sampling events, and best practices for using the data in research.

Some anticipated course content includes discussion and activities about georeferencing integration, georeferenced data visualization, and georeferences for modeling and research. Detailed agenda in development.

Logistics:

Course Instructor List

(in alphabetical order) David Bloom, Matt Collins, Shelley James, Sara Lafia, Deborah Paul, Marcy Revelez, Nelson Rios, Katja Seltmann, Jessica Utrup, Mike Yost

Bring your Datasets and Laptops:

Participants are strongly encouraged to bring representative datasets from their collections or research that need georeferencing to expose everyone to the variety of locality data georeferencing issues and give the experts and participants a chance to work together to address any challenges.

Participants must bring their own laptops and everyone will have wired access to facilitate the best possible workshop experience.

Reading Materials and Resources:

  1. Georeferencing.org
  2. Georeferencing Quick Reference Guide
    version 2012-10-08. John Wieczorek, David Bloom, Heather Constable, Janet Fang, Michelle Koo, Carol Spencer, Kristina Yamamoto
  3. Guide to Best Practices for Georeferencing - Chapman, A.D. and J. Wieczorek (eds). 2006
  4. Georeferencing Working Group Training Videos
  5. Georeferencing Incidents from Locality Descriptions and its Applications: a Case Study from Yosemite National Park Search and Rescue Transactions in GIS, 2011, 15(6): 775–793 Authors: Doherty, Guo, Liu, Wieczorek, Doke
  6. iDigBio Georeferencing Wiki http://tinyurl.com/idbgeowiki
  7. HerpNET Georeferencing Resources
  8. Take Workshop Notes Together Here
  9. Post - Workshop Survey Questions
  10. Got a Georeferencing Question? Post it on the iDigBio Georeferencing List Serve
  11. BITC Global Online Seminar #25: Simple Workflow for Data Cleaning

Wireless / Wired Access Issues:

Both wired and wireless access provided to workshop participants. Connectivity instructions will be provided at the workshop.

Goals of the Workshop:

Workshop Objectives:

Topics to be covered
Pre-workshop materials
Introductory information about datums, mapping, coordinate systems
Basic georeferencing how-to
During workshop
Data standards, DwC terminology and fields (e.g. lat, long, datum), differences among disciplines (neo- and paleontological fields)
Georeferencing toolkit and workflow examples (GeoLocate, maps, other resources, pros and cons)
Best practices for field collection of data (locality strings and GPS units, precision, datum)
Best practices for georeferencing of legacy data given:

 Varied research requirements for precision
Project and collection management limitations
Uncertainty data -, polygon vs. point radius, description etc.
Datum - georectify to standard or verbatim

Workflows for incorporating data into different collections databases
Best practice syntax in locality descriptions for use in automation vs verbatim strings
Database limitations
Multiple geopoint values and storage (verbatim, automated-non-vetted value, georef to nearest named place, update to more accurate value, etc.)
Downloading datasets - sources, different mechanisms
Assessing data quality
Uncertainty data - availability in data sources and interpretation
Tools for aggregating, cleaning, visualizing and analyzing data
e.g. R, QGIS
Creating maps
Spatial analyses
Automated tools using Geo data
Difficult cases, such as geopolitically fluid locations over time, offshore localities
Hands-on practice & case studies


Desired Outcomes:

Schedule of Events - Agenda - in development

Breakfast, Lunch and Dinner every day is on our own (not provided).

Day 1, Tuesday October 4th

Recording Day 1

Time
Activity
Presenter
8:45
Pick up Name Tags, Wireless Log-In, Wired Setup, Collaborative Notes (google doc)

9:00
Welcome by NCEAS host, Logistics, Trainer Introductions, Introduction to iDigBio, CCBER
Katja Seltmann - CCBER, Debbie Paul - iDigBio, Ben Halpern - Director NCEAS, Ginger Gillquist - Logistics NCEAS
9:20
From the participants and instructors: a quick informal survey

Quick Name/Rank/Serial# introductions

tools you use
what you’d like to be able to do, tools you'd like to be able to use
Deb Paul
10:00
Standards, Terms & Fields: Darwin Core Standard, Key Terminology

iDigBio Recommended fields

David Bloom, Shelley James
10:15
Georeferencing Quick Reference Guide, and Georeferencing Template Una Farrell
10:30
Coffee Klatch w/ NCEAS

11:15
Locality Types Una Farrell
11:45
Georeferencing Calculator, Calculator Manual
David Bloom
12:10
Lunch
13:10
Georeferencing Calculator Example and Exercises, MaNIS/HerpNET/ORNIS Georeferencing Guidelines
David Bloom
13:40
Internet Resources - Where to Begin? georeferencing.org

Exercises using online resources - Version 2

Una Farrell
14:40
Break

15:10
Exercises cont.

15:30
GEOLocate: Overview, Basics & Demos

GEOLocate Introduction
Collaborative Georeferencing using CoGe by GEOLocate

Nelson Rios
17:00
Day in Review
Trivia Question of the Day
Survey (15 min)
17:30
End

Dinner on our own - See list of local restaurants. Optional Evening Activity: Happy hour and joyful GeoGathering at Hoffmann Brat Haus

Day 2, Wednesday October 5th

Recording Day 2

Time
Activity
Presenter
8:50
Please complete Survey for Day 1!
9:00
Two! Trivia Questions
Review and Questions
Software Installs check for tomorrow
All
9:10
GEOLocate: Advanced Features, Collaborative Georeferencing and the GEOLocate API
Nelson Rios
10:00
Importance of Polygons
Mike Yost, Nelson Rios
10:30
Break

11:00
GPS Units and APPs: Exercise Introduction
David Bloom, Mike Yost, Shelley James, Katja Seltmann
11:15
GPS Exercises (continued outside)
All
12:15
Lunch

13:15
GPS Exercises (continued outside)
Please upload your GPS Data here
All
13:30
Good and Bad Localities, Field Locality Handout: MVZ and iDigBio GWG Guide for Recording Localities in Field Notes,
Field Information Management Systems (FIMS)
Paper maps
David Bloom
14:15
Georeferencing Workflows: presentations and discussion

Researcher and Collections perspectives: Producers and Consumers

Collection and Data Managers
Researchers
Mike Yost
Jessica Utrup
Sara Lafia
Katja Seltmann
Shelley James
Edward Davis


Digitization Workflows at iDigBio
Georeferencing Protocols and Workflows - from a collections viewpoint
All
15:15
Break
15:45
Online Exercises, Review of known answers

16:30
GPS Exercise - Review (.kmz), Summary Spreadsheet, Field Worksheet, Locality Descriptions
GPSTour
GPS Status
Geopaparazzi
Camera GPS
Theodolite
David Bloom, Jessica Utrup
16:45
Day in Review

Download dataset for tomorrow
Trivia Question of the Day

17:15
Survey (15 min)
17:30
End

Dinner on our own - See list of local restaurants. Optional Evening Activities: TBA

Day 3, Thursday October 6th

Download zipped dataset The parameters for this dataset are specimens in the family Carabidae, that have geocoordinates, and are in California. It results in about 25,000 records in total.
Recording Day 3

Time
Activity
Presenter
9:00
Review and Questions
All
9:05
Georeferencing for Research Use Workshop - iDigBio Datasets

filter and get the dataset

Matthew Collins (remote), Katja Seltmann, Shelley James
10:00
Data Quality: How to evaluate existing georeferenced data/Fitness for Use Katja Seltmann, Shelley James
10:30
Break

11:00
Cleaning Datasets: Spreadsheets, Open Refine, tracking your work
Deb Paul, Nelson Rios, Katja Seltmann
12:00
Lunch

13:00
Cleaning Datasets: Spreadsheets, Open Refine, tracking your work (2)
Deb Paul, Nelson Rios, Katja Seltmann
13:30
Visualizing datasets: Set up QGIS and load data
  • vector: points, lines, polygons
  • raster: images

Auxiliary datasets: Download any additional datasets of interest. Online Tutorial

Sara Lafia
15:00
Break

15:30
Visualizing datasets: Preview and explore toolkits & saving your maps and data
Sara Lafia
17:15
Survey (15 min)
17:30
End

Dinner: TBD

Day 4, Friday October 7th

Download zipped QGIS project The project to the point we completed on Day 3 is available for download in the same folder as the auxiliary data. Launch the QGIS project from the Tutorial.qgs file.
Recording Day 4

Time
Activity
Presenter
9:00
Questions and Review

Share your datasets! [1]: Upload your research datasets that you'd like to work on.

All
9:10
Exploring datasets: Aggregating by Regions
  • Join, aggregate, or summarize records by county
  • Summarize observations intersecting counties
  • Finding altitude
  • Why do this?


Sara Lafia, Katja Seltmann, Nelson Rios
9:50
Exploring datasets: Time animation
  • By collecting event date or collector (to show biases in collection)
  • Check for errors with dates or transcriptions
Sara Lafia
10:30
Break

11:00
Exploring datasets: Uncertainty
  • Bin points based on uncertainty rank
  • Symbolize uncertainty by collector, data quality score - systematic error
Sara Lafia
11:30
Exploring datasets: Spatial autocorrelation
  • Distribution/clustering prevalence based on species, collector, etc.
  • Date-collector correlation
Sara Lafia
12:00
Lunch on our own.
13:00
LifeMapper LIVE DEMO
Jeffrey Cavner, James Beach, et al
13:15
Work on own data sets/Open question time/Practice. Polygon practice
Nelson Rios, et al
13:45
Breakout sessions

Cleaning data using r
Accessing APIs using r
More with QGIS
GEOLocate in Symbiota
Advanced GEOLocate
GPS Apps
Georectification
Try GeoODK http://geoodk.com/


15:30
Break

16:00
Research Use of the Data. A conversation from the collective point-of-view of the researchers present.
Challenges? Experiences? Needs (software, skills, infrastructure)? What changes might you make now to your workflows?
Ed Davis, Katja Seltmann, Shelley James, Nelson Rios, Sara Lafia
16:30
Day & workshop in Review
iDigBio Webinar On Your Calendar Oct 12th, 2016 - Isn't that Spatial?
Post Workshop Survey

17:30
Beer

Dinner on our own - See list of local restaurants.



Some software install instructions from Data and Software Carpentry

Trained Georeferencers

Pre-Workshop Assignments

  1. Attend pre-workshop online meeting. Two options, choose one.
    1. Thursday September 15th - two times to choose from:
      1. 11am EDT (10am CDT, 9am MDT, 8am PDT)
      2. 3pm EDT (2pm CDT, 1pm MDT, 12pm PDT)
    2. Sign Up Here: https://goo.gl/forms/WmJO6z79rx5nHlv32
    3. Meet: http://idigbio.adobeconnect.com/geotrain
  2. Please watch the following videos - before the workshop. (flipped-classroom). Be sure to note any questions / insights to share with the group.
    1. Collaboration to Automation: https://vimeo.com/53006304 (25 min lecture, 10 min discussion)
    2. Geographical Concepts: https://vimeo.com/53008556 (4 min lecture, 2 min discussion)
      1. https://vimeo.com/album/2163673/video/63692461 (4 min lecture only)
    3. Point Radius Method and Best Practices: https://vimeo.com/53006303 (20 min lecture, 5 min discussion)
    4. OPTIONAL video: BITC Global Online Seminar #25: Simple Workflow for Data Cleaning (1 hour)
  3. Please install the following software
    1. QGIS and then QGIS Plugins. NOTE it's easy to install all the plugins from inside QGIS once you have it installed.
       
      QGIS plugins menu - Manage and Install
       
      QGIS plugins menu
      1. QGIS: http://qgis.org/en/site/forusers/download.html
      2. QGIS Plug-ins: Open your QGIS installation on your laptop > navigate to Plugins > Manage and Install Plugins (as seen in the screenshots). You can then add these plugins within QGIS by typing the tool name into the search box and clicking on "Install Plugin": Clipper, Coordinate Capture, GPS Tools, Heatmap, Interpolation, OpenLayers, Processing, TimeManager, and Lifemapper.
        1. Clipper (clip intersecting vector features)
        2. Coordinate Capture (find coordinates in various coordinate reference systems (CRS) via mouse-over)
        3. Gazetteer Search (finding named places via a search bar): NOTE: The Gazetteer Plugin is not "discoverable" through the Plugins manager in QGIS. You'll need to follow the installation steps listed here: https://github.com/AstunTechnology/QGIS-Gazetteer-Plugin#Installation
          1. Manual
            1. find where your QGIS is installed on your machine
            2. right click the folder to see contents and find the folder for Plugins
              1. for example, on Deb's Windows 10 laptop, the path to the correct QGIS plugins folder is C:\Users\dlpss\.qgis2\python\plugins
            3. make a folder called gazetteersearch inside of the QGIS Plugins directory
            4. download the contents from GitHub and move them into the gazetteersearch folder
            5. close and reopen QGIS in order for the plugin to show up
          2. via Git
            1. clone the repository into your QGIS Plugins folder following the steps from the link above. Please let Sara know if you have any other questions.
        4. GPS Tools (loading and importing GPS data)
        5. Heatmap (generate a heatmap raster given input vector points)
        6. Interpolation (interpolation techniques given vertices of a vector layer)
        7. OpenLayers (load basemaps from OpenStreetMap, Google, etc.)
        8. Processing (spatial data processing framework)
        9. TimeManager (event-visualization animation for vector features)
        10. Lifemapper: Plugin for Lifemapper webservices for SDM modeling, and multispecies Presence Absence Matrix (PAM) analysis. The tool allows you to build SDM models using GBIF, iDigBio, or user supplied species occurrence data.
      3. Gazetteer Search requires an additional step; follow these steps to install (manual):
        1. find where your QGIS is installed on your machine
        2. right click the folder to see contents and find the folder for Plugins
        3. make a folder called gazetteersearch inside of the QGIS Plugins directory
        4. download the contents from GitHub and move them into the gazetteersearch folder
        5. close and reopen QGIS in order for the plugin to show up
        6. OR install via command line (using Git - see instructions in link above)
        7. clone the repository into your QGIS Plugins folder following the steps from the link above.
    2. Open Refine: (previously Google Refine) is a tool for data cleaning that runs through a web browser, and any browser - Safari, Firefox, Chrome, - should work fine (Explorer not recommended). You will need to download Google Refine and install it, and when you open it, it will run through the browser, but you don't need an internet connection, and the data will all be stored on your computer. (Use these resources Open Refine Install or Install Open Refine for more help if you run into any Open Refine install issues).
      1. Windows
        1. Go to the OpenRefine download page.
        2. Click on Windows kit to download the install file
        3. To use it, unzip, and double-click on openrefine.exe (if you're having issues with openrefine.exe try refine.bat instead)
        4. OpenRefine will then open in your web browser.
        5. If it doesn't open automatically, open a web broswer after you've started the program and go to the URL http://localhost:3333 and you should see OpenRefine.
      2. MacOS
        1. Go to the OpenRefine download page.
        2. Click on Mac kit to download the install file
        3. Open the downloaded .dmg file
        4. Drag the icon in to the Applications folder
        5. Double click on the icon and Google Refine will then open in your web browser.
        6. If it doesn't open automatically, open a web broswer after you've started the program and go to the URL http://localhost:3333 and you should see OpenRefine.
      3. Linux
        1. Go to the OpenRefine download page.
        2. Click on Linux kit to download the install file
        3. Download and extract
        4. Type ./refine in your terminal and Google Refine will then open in your web browser.
        5. If it doesn't open automatically, open a web broswer after you've started the program and go to the URL http://localhost:3333 and you should see OpenRefine.
    3. Spreadsheet software (your choice, Libre Office, Excel, etc.,)
      1. We'll be using a spreadsheet program. If you already have a spreadsheet program installed, like LibreOffice, Excel or OpenOffice, you can use whatever you already have. If you don't have a spreadsheet program, please download and install LibreOffice from http://www.libreoffice.org/download/libreoffice-fresh/
    4. Java: Please make sure you have Java installed (needed for Open Refine to work).
  4. OPTIONAL software install and tutorials - if you are interested in the R breakout section we will offer at the workshop.
    1. R & RStudio: R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.
      1. Windows
        1. Video Tutorial
        2. Install R by downloading and running this .exe file from CRAN (http://cran.r-project.org/index.html).
        3. Also, please install the RStudio IDE.
      2. Mac OS X
        1. Video Tutorial
        2. Install R by downloading and running this .pkg file from CRAN (http://cran.r-project.org/index.html).
        3. Also, please install the RStudio IDE.
      3. Linux
        1. You can download the binary files for your distribution from CRAN. Or you can use your package manager
          1. e.g. for Debian/Ubuntu run sudo apt-get install r-base and for Fedora run sudo yum install R.
        2. Also, please install the RStudio IDE.
    2. Then install packages:
    3. R Tutorials. OPTIONAL take a short course in R. If you are a novice, take a beginner course. We don't expect you know know R well, but we do need you be familiar enough to follow along with one of our optional hands-on sessions. There are several good options:
      1. Try R (Code School course)
      2. Beginner Course: Up and Running with R with Barton Poulson (course at lynda.com)
      3. Intermediate Course: R Statistics Essential Training with Barton Poulson(course at lynda.com)
      4. For the future you could take a Coursera class. intro to R(Coursera course started August 22nd).
    4. Georeferencing using Apps: please install either of these on your device, if you want to try georeferencing this way to compare with results from a GPS unit.
      1. GPS Status: available for android and iOS devices.
      2. Geopaparazzi: android only