Managing Natural History Collections Data for Global Discoverability: Difference between revisions

From iDigBio
Jump to navigation Jump to search
Line 21: Line 21:


== General Information ==
== General Information ==
[[Image:Application2.png|right|400px|]]
[[Image:Application2.png|right|500px|]]
'''Description and Overview of Workshop.''' Are you:
'''Description and Overview of Workshop.''' Are you:
*actively digitizing NHC data and looking to do it more efficiently?
*actively digitizing NHC data and looking to do it more efficiently?

Revision as of 17:16, 26 August 2015

Managing Natural History Collections Data for Global Discoverability
Capture1.JPG

Quick Links for Managing NHC Data for Global Discoverability wiki
Managing NHC Data Announcement
Managing NHC Data for Global Discoverability - Agenda
Managing NHC Data for Global Discoverability Biblio Entries
Managing NHC Data for Global Discoverability Report

This wiki supports the Managing Natural History Collections (NHC) Data for Global Discoverability Workshop and is in development. This workshop is sponsored by iDigBio and hosted by the Arizona State University (ASU) School of Life Sciences Natural History Collections, Informatics & Outreach Group in their new Alameda space on September 15-17, 2015. It is the fourth in a series of biodiversity informatics workshops held in fiscal year 2014-2015. The first three were 1) Data Carpentry, 2) Data Sharing Data Standards and Demystifying the IPT, and 3) Field to Database (March 9 - 12, 2015).

General Information

Application2.png

Description and Overview of Workshop. Are you:

  • actively digitizing NHC data and looking to do it more efficiently?
  • getting ready to start digitizing NHC data and looking to learn some new skills to enhance your workflow?
  • digitizing someone else’s specimens (e.g., as part of a research project)?
  • finding yourself in the role of the museum database manager (even though it may not be your title or original job)?
  • someone who has a private research collection who wishes to donate specimens and data to a public collection?

The theme of the "Collections Data for Global Discoverability" workshop is ideally suited for natural history collections specialists aiming to increase the "research readiness" of their biodiversity data at a global scale. Have you found yourself in situations where you need to manage larger quantities of collection records, or encounter challenges in carrying out updates or quality checks? Do you mainly use spreadsheets (such as Excel) to clean and manage specimen-level datasets before uploading them into your collections database? The workshop is most appropriate for those who are relatively new to collections data management and are motivated to provide the global research community with accessible, standards- and best practices-compliant biodiversity data.

During the workshop essential information science and biodiversity data concepts will be introduced (i.e., data tables, data sharing, quality/cleaning, Darwin Core, APIs). Hands on data cleaning exercises using spreadsheet programs and readily usable and free software will be performed. The workshop is platform independent, and thus will not focus on the specifics of one or the other locally preferred biodiversity database platforms, instead addressing fundamental themes and solutions that will apply to a variety of database applications.


To Do For You: Pre-reading materials [Darwin Core Data Standard, Best Practices for Data Management,...]

Updates will be posted to this website as they become available.

Planning Team

Collaboratively brought to you by: Katja Seltmann (AMNH - TTD-TCN), Amber Budden (DataONE), Edward Gilbert (ASU - Symbiota), Nico Franz (ASU), Mark Schildhauer (NCEAS), Greg Riccardi (FSU - iDigBio), Reed Beaman (NSF), Cathy Bester (iDigBio), Shari Ellis (iDigBio), Kevin Love (iDigBio), Deborah Paul (FSU - iDigBio)

About

Instructors (iDigBio): Katja Seltmann, Amber Budden, Edward Gilbert, Nico Franz, Greg Riccardi, Deborah Paul, Joanna McCaffrey, Kevin Love, Anne Thessen

Skill Level: We are focusing our efforts in this workshop on beginners.

Where and When: Tempe, AZ at the Arizona State University (ASU) School of Life Sciences Natural History Collections, Informatics & Outreach Group in their new Alameda space, September 15 - 16, 2015

Requirements: Participants must bring a laptop.

Contact (iDigBio Participants): Please email Deb Paul dpaul@fsu.edu for questions and information not covered here.

Twitter:

Tuition for the course is free, but there is an application process and spots are limited (and class is full).

Software Installation Details

A laptop and a web browser are required for participants.
We use Adobe Connect extensively in this workshop. Please perform the systems test using the link below. Also, you will also need to install the Adobe Connect Add-In to participate in the workshop.

  • Adobe Connect Systems Test
    • Note when you follow the link to install and perform the test, some software will install (but it doesn't look like anything happens). To check, simply re-run the test.

Agenda

Schedule - subject to change.

Course Overview - Day 1 - Tuesday September 15th
8:15-8:30 Check-in, name tags, log in, connect to wireless and Adobe Connect All
8:30-9:00 Welcome, Introductions, Logistics, Intro to the Workshop, Why Share Data?
quick exercise - what are your data challenges? what software do you use? how do you track / document your procedures?
Deb Paul, iDigBio
9:00-9:15 Why this workshop? Amber Budden & Deb Paul
09:15-9:35 General Concepts and Best Practices
brief introduction to data modeling, the data life-cycle, and relational databases
Ed Gilbert and Amber Budden
9:35-9:55 Overview of Data standards
Darwin Core, EML, Audubon Core, GGBN, DwC-A, Identifiers (GUIDs vs local)
Ed Gilbert, Deb Paul
10:00-10:30 Hands-on Exercise with Specimen Data Set
data set with known mapping / standardization issues.
All
10:30-10:50 Break all
10:50-11:30 Data Management Planning
choosing a collection management system, data flow, data backup, field-to-database, metadata
Amber Budden
11:30-12:00 Exercise DataONE Lesson 4: best practices for data entry and data manipulation Amber Budden
12:00-1:00 Lunch (Provided by Panera)
1:00-1:30 Images and media issues: a brief intro
choosing a camera, issues across different database platforms, image submissions, linking images to occurrence records, batch processing
Ed Gilbert and Joanna McCaffrey
1:30-1:50 Digitization workflows and process
getting started, prioritization, specimen collecting, new database, integrating old data
Deb Paul, Ed Gilbert & Katja Seltmann
1:50-2:10 Common Workflows
image to data, specimen to data, skeletal records, crowd-sourcing, OCR/NLP, georeferencing, metadata
Deb Paul, Ed Gilbert & Katja Seltmann
2:10-2:25 Optimization
Reviewing your own workflow, common bottlenecks, policy, documentation
Katja Seltmann, Deb Paul & Ed Gilbert
2:25-3:00 (time to increase here, decrease above) Hands-on exercise (to be decided) (May Be Group Tours: 1)Insects, 2)Botany 3)Symbiota) tbd
3:00-3:20 Break
3:20-3:50 Georeferencing Data (Georeferencing Workflow)
visualization tools, when to georeference, best practices (the import of standards): error uncertainty, georeferencingRemarks
Ed Gilbert
3:50-4:10 GEOLocate Exercise (May be DEMO)
CoGe, GPS Visualizer, re-integration, qc
Ed Gilbert
4:40-5:30 Conversation, overview of day, volunteers, preview for tomorrow... All
Course Overview - Day 2 - Wednesday September 16th
8:30-12:00 Desert Botanical Garden (DBG) Field Trip and Lunch
meet at 8:30 in Hotel Lobby, depart at 8:40 for DBG; garden from 9-11:30, lunch 11:30 - 12:30, depart 12:40 to ASU
12:00-1:00 Lunch at Gertrude's (in the Garden)
1:00-1:25 Welcome Back and Intro to Data Quality
inside the data-life-cycle, cost of data quality, quality vs completeness
Amber Budden, Greg Riccardi, (Ed Gilbert)
1:25-1:40 Data Cleaning
where, when and how does it happen?, what kind of feedback to expect
types of common errors and omissions, best practices strategies, feedback and annotation, error tracking, automation, policies and protocols
Deb Paul & Katja Seltmann
1:40-2:20 Data Cleaning Exercise I
(opt: quick exercise - spot the snafus)
better spreadsheet skills (Data Carpentry)
Deb Paul & Katja Seltmann
2:20-2:50 Data Cleaning Exercise II
Open Refine, part I (facets, clustering)
Deb Paul & Katja Seltmann
2:50-3:10 Break
3:10 - 4:40 time for discussion / break outs / unconference topics or demos Deb Paul & Katja Seltmann
4:40-5:00 Conversation, overview of day for context and questions, homework and preview for tomorrow... Deb Paul & Katja Seltmann
Course Overview - Day 3 - Thursday September 17th
8:45-9:00 Discussion of Material Covered so far and Overview of Day 3 Katja Seltmann
9:00-9:40 Review Tools for Data Cleaning, Data Manipulation, and Visualization (and Lessons)
Kurator, GPS Visualizer, GEOLOcate, CoGE, Google Maps, CartoDB, Google Fusion Tables, Notepad ++, Open Refine, Access,(others)
Where do they fit in your workflow?
Deb Paul & Katja Seltmann
9:40 - 10:00 Sharing Data: Preparing and Moving Data to the Internet
making data useful, understandable in the outside world, properties, values and being systematic
Greg Riccardi
10:00-10:35 Break
10:35-12:00 Break out groups
TNRS,ECAT,QGIS,GEOLocate,CoGe,Data Cleaning: what is scripting? what is regex? examples in Open Refine, possibly in Symbiota, your own data issues / requests,Data Cleaning Exercise II - using Open Refine, part II (Using APIs, Taxonomic Name Resolution Services)
All
12:00-1:00 Lunch (Provided by Panera)
1:00-1:25 Data Publishing: in the context of the data life cycle
benefits, concerns, aggregators, citation, attribution
VertNet Norms For Data Use and Publication
Anne Thessen, http://datadetektiv.com/
1:30-2:15 iDigBio Portal Exercise: Using iDigBio portal to do something with data that can’t be done within a local system, Ex. PhyloJive Deb Paul & Katja Seltmann
2:15-2:45 Copyright / Intellectual Property
VertNet Guide to Copyright and Licenses for Dataset Publication
Jonathan Rees, Greg Riccardi, ...
3:00-3:20 Break
3:20-4:20 Second round of break-out groups
DWC-A publishing Exercise (or DEMO): using IPT instance OR
Symbiota DwC-A mapping and publishing exercise,
others
Edward Gilbert
4:20-4:40 Closing topics
a greater network, the global landscape, next steps
Katja Seltmann & Nico Franz
4:40-5:10 Participant 3 minute Presentations (1 slide)
5:10 - 5:30 Review Data Life Cycle we’ve walked through.
discussion, survey, next steps, and conclusions
all

Logistics

Adobe Connect Access

Adobe Connect will be used to provide access for everyone and for remote folks to listen to the lectures.

Workshop Documents, Presentations, and Links

  • Google Collaborative Notes
  • links to any presentations (like power points) here
  • Darwin Core Terms
  • Participant Presentations

Pre-Workshop Reading List

Links beneficial for review

Workshop Recordings

Day 1

  • 8:30am-10:15m
  • 10:45am-11:00am
  • 11:15am-12pm
  • 1:00pm-2:30pm
  • 3:00-5:00pm

Day 2

  • 8:30am-10:15m
  • 10:45am-11:00am
  • 11:15am-12pm
  • 1:00pm-2:30pm
  • 3:00-5:00pm

Day 3

  • 8:30am-10:15m
  • 10:45am-11:00am
  • 11:15am-12pm
  • 1:00pm-3:30pm
  • 3:30-5:00pm

Resources and Links

Digitization Training Workshops Wiki Home