Data Sharing Data Standards and Demystifying the IPT
Data Sharing Data Standards and Demystifying the IPT | |
---|---|
Quick Links for Data Sharing Data Standards and Demystifying the IPT | |
Data Sharing Data Standards and Demystifying the IPT Agenda | |
Data Sharing Data Standards and Demystifying the IPT Biblio Entries | |
Data Sharing Data Standards and Demystifying the IPT Report |
This wiki supports the Data Sharing, Data Standards and Demystifying the IPT Workshop held simultaneously at the University of Florida at iDigBio and at the Canadian Biodiversity Information Facility (CBIF) on 13-14 January 2015. It is the second in a series of biodiversity informatics workshops planned in the fiscal year (2014-2015). The first one was Data Carpentry. The next one is Field to Database (March 9 - 12, 2015).
General Information
Description and Overview of Workshop. Are you a taxonomist collecting biological specimens in your research and vouchering them in collections? How does your specimen data get published? How does it get into collections databases? Are you a collection manager or data manager who would like to use the GBIF Integrated Publishing Toolkit v.2 (IPT) to publish your collection's datasets?
This workshop is for you if you:
- have a dataset and need/want to get it into a standard format for sharing
- manage data for a museum collection and would like to learn how to use the GBIF IPT
- want to understand more about Darwin Core and Data Sharing Standards
- would like to understand just what is meant by "Darwin Core Archive file (DwC-A)"
- want to learn how to create or update Darwin Core Archive files using the IPT
- would like to understand just a bit more about where data goes and how it gets there once it leaves your collection
- are a taxonomist with an occurrence dataset who would like publish your dataset as a DwC-A with your related taxonomic publication
We'll discuss and focus on the concepts, skills, and tools we need to share biodiversity occurrence data and related data such as genomics, and media. Datasets will be taken from organismal and evolutionary biology, biodiversity science, ecology, and environmental science. The workshop format includes lectures and hands-on work, so participants are required to bring their own laptops. We will provide information and instructions on software installations and a pre-workshop online meeting is required for those participants wishing to install the IPT software on their own laptop.
Note this workshop does not focus on Installation and set up of an IPT instance, rather on the use of the installed software for data sharing and data publishing. There's a pre-workshop webinar that is going to cover an Overview of the Considerations and Steps for IPT Install and Set up on 7 January 2015 from 12 noon to 2 PM EST.
To Do For You: We encourage people attending this webinar to submit questions prior to the event so we can be prepared to answer some of them ahead of time. Note: this is not an installation demonstration. We do plan to have an installation video available to provide one example, one use case, for seeing a sample (typical) installation process.
Updates will be posted to this website as they become available.
Pre-Workshop WEBINAR on IPT Install / Set up
Those interested in knowing more about the installation and configuration of the IPT will have an opportunity to be introduced to the topic during this 2-hour webinar. The event is open for anyone interested so feel free to share the information with others. The official announcement and webinar link is available here.
The webinar will be held on Jan 7th, 2015. NOON to 2 PM EST, through the iDigBio Adobe Connect Platform.
Webinar Resources
- Introduction and context to the tool.
- User manual (available in English and Spanish).
- IPT project site.
- Presentation: introduction to the webinar.
- Presentation: Installation requirements.
- Presentation: Installation and set-up.
- Video: how to install the GBIF IPT Second link, private for those in Agriculture Canada. File in Dropbox
- Presentation: Tool customization.
- Presentation: Known issues and how to get help.
- Webinar Google Doc for Notes (Please use Workshop Google Doc for workshop notes).
- Evaluation form.
- Recording of Webinar
- Webinar Evaluation
Planning Team
Collaboratively brought to you by: (in alphabetical order) Reed Beaman (NSF), Cathy Bester (iDigBio),Kyle Braak (GBIF), Matt Collins (ACIS - iDigBio), Shari Ellis (iDigBio), Alberto González-Talaván (GBIF), Chris Lewis (CBIF), Anissa Lybaert (CBIF), Kevin Love (iDigBio), James Macklin (CBIF), Derek Masaki (USGS - BISON), Andrea Matsunaga (ACIS - iDigBio), Joanna McCaffrey (iDigBio), Deborah Paul (FSU - iDigBio), Bénédicte Rivière (Canadensys), Laura Russell (VertNet), David Shorthouse (Canadensys), Dan Stoner (ACIS - iDigBio), Alex Thompson (ACIS - iDigBio)
About
Instructors (iDigBio): Laura Russell (VertNet), Derek Masaki (USGS), Alberto González-Talaván (GBIF), Deborah Paul (FSU - iDigBio)
Instructors (CBIF/Canadensys): David Shorthouse (Canadensys), James Macklin (CBIF)
Assistants (iDigBio): Andrea Matsunaga (ACIS - iDigBio), Joanna McCaffrey (iDigBio), Dan Stoner (ACIS - iDigBio), Matt Collins (iDigBio)
Assistants (CBIF/Canadensys): Christian Gendreau (Canadensys), Heather Cole (CBIF), Chris Lewis (CBIF), Anissa Lybaert (CBIF), Joel Sachs (CBIF), Allan Jones (CBIF), Glen Newton (CBIF), Satpal Bilkhu (CBIF)
Who: Regardless of title, if you manage data for biological specimens collections, perhaps as Data Manager or Collection Manager, this workshop is for you if you have a dataset, and want to learn about data sharing and standards, and how to share data using the DwC-A format the IPT tool produces. If you are a taxonomist and would like to publish your specimen data in a DwC-A format with your taxonomic publications, you're also welcome to apply.
Skill Level: While we don't expect experts, we do expect computer skills commensurate with a Data Manager / Collection Manager.
Where: iDigBio in Gainesville, FL and (CBIF in Ottawa, ON, Canada via teleconference using Adobe Connect)
Requirements: Participants must bring a laptop.
Contact (iDigBio Participants): Please email Deb Paul dpaul@fsu.edu for questions and information not covered here.
Contact (CBIF, Ottawa Participants): Please email James Macklin james.macklin@agr.gc.ca
Twitter: https://twitter.com/hashtag/iptworkshop/
Tuition for the course is free, but there is an application process and spots are limited. Currently, the workshop is full.
Software Installation Details
A laptop and a web browser are required for participants. Individualized IPTs will be installed on networked servers and made available to all participants throughout the duration of the workshop. These installations will not be publicly accessible and will not persist at the completion of the workshop. We use Adobe Connect extensively in this workshop. Please perform the systems test using the link below. Also, you will also need to install the Adobe Connect Add-In to participate in the workshop.
- Adobe Connect Systems Test
- Note when you follow the link to install and perform the test, some software will install (but it doesn't look like anything happens). To check, simply re-run the test.
If enterprising participants would like to install the IPT on a laptop prior to the workshop, a script is available to help streamline the necessary steps in a virtual machine. It requires a pre-existing install and experience with VirtualBox (or libvrt) and Git.
- Install instructions: https://github.com/AAFC-MBB/gbif-ipt-vagrant
The pre-workshop Webinar focuses on Install and set up of IPT on a server.
Agenda
- (Adobe Connect Room Registered Room Link)
- Pre-workshop IPT Install and Set-up Webinar: January 7th, 2015 12-2 PM EST
- Monday evening, January 12th: pre-workshop informal dinner at Piesano's next door to the University Holiday Inn, 7 - 8 PM.
Schedule - subject to change.
Course Overview - Day 1 - Tuesday January 13th | ||
---|---|---|
8:30-8:45 | Check-in, name tags, log in, connect to wireless and Adobe Connect, Workshop Participant List | All, (both locations) |
8:45-9:00 | Local logistics, etiquette for questions | Deb Paul, iDigBio (Gainesville); James Macklin, CBIF (Ottawa) |
9:00-9:20 | 1A: Introductions, Workshop Overview, Adobe Connect, Goals of the Workshop | Deb Paul, David Shorthouse, James Macklin |
09:20-10:15 | 1B: Theory: publishing basic primary biodiversity data: IPT and other methods.
Benefits: data papers, Nature data descriptors Standards: Darwin Core, Darwin Core Archive, TDWG and ratification process Workflow: GBIF registry, harvesting, presentation |
Alberto Gonzalez-Talavan (Gainesville); James Macklin (Ottawa); Laura Russell (Gainesville) |
10:15-10:45 | Break (during this break, please verify you can log in to your IPT instance). | all |
10:45-11:40 | 1C: Theory: What are the metadata?
-Demo: instructor creates a resource in IPT and fills out metadata while everyone follows along |
David Shorthouse (Derek Masaki, Deb Paul) |
11:40-12:00 | 1D: Theory: publishing with the IPT
|
Laura Russell (David Shorthouse) |
12:00-1:00 | Lunch (provided) | |
1:00-1:30 | 1D cont'd:
-Demo: adding a data source and mapping data
|
Laura Russell (David Shorthouse) |
1:30-2:30 | 1E: Theory: Complex primary biodiversity data
-Demo: multimedia (Audubon Core) extension |
Deb Paul, Derek Masaki, Laura Russell (David Shorthouse) |
2:30-3:00 | Break | |
3:00-4:30 | 1E cont'd:
-Demo: determination histories extension
|
Laura Russell, Deb Paul (David Shorthouse) |
4:30-5:00 | 1F: wrap-up and review for tomorrow
|
|
6:00-8:30 | Night at the Museum reception (Gainesville only): meet in Holiday Inn Hotel Lobby for bus pick up at 6 PM | |
Course Overview - Day 2 - Wednesday January 14th | ||
8:30-9:00 | Check-in, log in, pairing, connect to wireless... | |
9:00-10:15 | 2A: Open Practical session (Gainesville participants choose format, ex. work in pairs: admin + data manager)
|
|
10:15-10:45 | Break | |
10:45-12:00 | 2B: Open Practical session
|
|
12:00-1:00 | Lunch (provided) | |
1:00-2:30 | 2C: Administration functions and user management in the IPT (roles, permissions), registering data sets with GBIF (production mode instance)
|
Laura Russell (David Shorthouse) |
2:30-3:00 | Break | |
3:00-4:00 | 2D: Collaboration and the way forward:
|
Alberto González-Talaván (David Shorthouse) |
4:00-4:30 | 2E: Summary of the webinar and workshop, evaluation and feedback, next steps (participants present at their own institutions - and report back/share presentation), wrap-up. |
Link to Workshop Report
Logistics
- Area Restaurants
- Hotel / Food / Contact information / Per diem Logistics
- Workshop Calendar Announcement
- Participant List
Adobe Connect Access
Adobe Connect will be used to provide access for participants at The Canadian Biodiversity Information Facility (CBIF) in Ottawa, ON, Canada to instruction from iDigBio in Gainesville. Some instruction may come from CBIF to Gainesville. Workshop participants in both locations will be required to log in to the Adobe Connect room to facilitate communicating with each other. Already registered and accepted?
- Link to Adobe Room for Registered Participants
Workshop Documents, Presentations, and Links
- short link to this page http://tinyurl.com/iptworkshop
- Workshop Collaborative Notes in Google Doc
- Ottawa Participant and Sample Data Sets
- Gainesville Participant and Sample Data Sets
- Darwin Core Terms
- Participant Presentations. Participants were asked to volunteer to present one slide in a lightning talk session. 2 folks from Ottawa, 2 from Gainesville.
- Please share one thing you learned, one thing you realize you still need to learn more about, how you plan to learn more, and how you will share what you've learned with your colleagues after this workshop.
- Rick Levy, Database Associate, Denver Botanic Garden on "occurrenceID" and "recordID
- Shelley James,Botanist/Collections manager PCMB and Holly Bolick, Collections Manager- Invertebrate Zoology, Bishop Museum
- Nadia Cavallin, Herbarium Curator, Royal Botanical Gardens, Burlington
- Jennifer Wilkinson, Assistant Herbarium Curator Mycology, AAFC, Ottawa
Pre-Workshop Reading List
Links beneficial for review
- Darwin Core Terms Index
- Mapping to Old Versions (for those who might be familiar with older versions of DwC)
- Audubon Media Description standard terms index
- others? (Perhaps Canadensys and VertNet Norms, Canadensys Creative Commons licensing for occurrence data and VertNet Data Licensing Guide)
- iDigBio Data Ingestion Guidance
Workshop Recordings
Day 1
- 8:30am-10:15m
- 10:45am-11:00am
- 11:15am-12pm
- 1:00pm-2:30pm
- 3:00-5:00pm (not recorded, open discussion, see google doc for topics discussed)
Day 2
- 8:30am-10:15m (not recorded, open discussion, see google doc for topics discussed)
- 10:45am-12:00pm (not recorded, open discussion, see google doc for topics discussed)
- 1:00pm-2:30pm
- 3:00-5:00pm
Resources and Links
- Got a favorite resource - a book?, a website? to share with your classmates?
- Canadensys Introduction to Darwin Core
- Experts Workshop on the GBIF Integrated Publishing Toolkit (IPT) v. 2
- Summary resources available from IPT workshop held the 20-22 June in Copenhagen, Denmark.
- Example of a Data Paper: Yves Bousquet, Patrice Bouchard, Anthony E. Davies, and Derek S. Sikes. 2013. Data associated with CHECKLIST OF BEETLES (COLEOPTERA) OF CANADA AND ALASKA. SECOND EDITION. DATA PAPER. ZooKeys. http://dx.doi.org/10.5886/998dbs2a
- For more Data Papers: http://biodiversitydatajournal.com/
- Darwin Core extension for germplasm Dag Endresen (on slideshare)
- Data exchange standards, protocols and formats relevant for the collection data domain within the GFBio network
- Check out this link if you'd like to see one example page about the multitude of current standards in use in the Natural History Collections and Culture Collections world; this example is from the german federation for the curation of biological data (gfbio).
- GBIF Darwin Core Archive, How-to Guide (download the pdf).
- GBIF Metadata Profile Reference Guide (download the pdf).
- Darwin Core Quick Reference Guide (download the pdf).
- Guide on how to use the BioVeL portal (includes a section on OpenRefine).
- lynda.com is a useful collection of tutorials on various IT and other resources - e.g. on relational databases
- For example see relational database fundamentals
- You want to share genetic sequence data for your specimens? Are the sequences in a database like GenBank? You can use dwc:associatedSequences field to share links to the sequences and metadata about them. Note you can soon use the Material Sample Core, and share more complex genomic data using the GGBN extensions, and also use an extension to share the specimen information from which the samples were taken.