Managing Natural History Collections Data for Global Discoverability: Difference between revisions

From iDigBio
Jump to navigation Jump to search
 
(134 intermediate revisions by 6 users not shown)
Line 3: Line 3:
! colspan="2" style="background:#D58B28;width:200px;font-size:10pt" | Managing Natural History Collections Data for Global Discoverability
! colspan="2" style="background:#D58B28;width:200px;font-size:10pt" | Managing Natural History Collections Data for Global Discoverability
|-
|-
| colspan="2" style="text-align:center;font-size:7pt" | <!--YOU CAN INSERT A NEW IMAGE FOR THE LOGO BETWEEN THE COLON AND THE PIPE-->[[Image:|center|500px|]]<br />
| colspan="2" style="text-align:center;font-size:7pt" | <!--YOU CAN INSERT A NEW IMAGE FOR THE LOGO BETWEEN THE COLON AND THE PIPE-->[[Image:Capture1.JPG|center|200px|]]<br />
|-
|-
!colspan="2" style="background:#D58B28;text-align:center;font-size:9pt" | Quick Links for Managing Natural History Collections Data for Global Discoverability
!colspan="2" style="background:#D58B28;text-align:center;font-size:9pt" | Quick Links for Managing NHC Data for Global Discoverability wiki
|-
|[https://www.idigbio.org/content/managing-natural-history-collections-data-global-discoverability Managing NHC Data Announcement]
|-  
|-  
|[https://www.idigbio.org/wiki/index.php/Managing_Natural_History_Collections_Data_for_Global_Discoverability#Agenda Managing Natural History Collections Data for Global Discoverability Agenda]
|[https://www.idigbio.org/wiki/index.php/Managing_Natural_History_Collections_Data_for_Global_Discoverability#Agenda Managing NHC Data for Global Discoverability - Agenda]
|-  
|-  
|Managing Natural History Collections Data for Global Discoverability Biblio Entries
|Managing NHC Data for Global Discoverability Biblio Entries
|-  
|-  
|Managing Natural History Collections Data for Global Discoverability Report
|Managing NHC Data for Global Discoverability Report
|}
|}
This wiki supports the Managing Natural History Collections (NHC) Data for Global Discoverability Workshop and is ''in development''. This workshop is sponsored by iDigBio and hosted by the Arizona State University (ASU) School of Life Sciences Natural History Collections, Informatics & Outreach Group in their new [https://www.flickr.com/photos/taxonbytes/15501490652/ Alameda space] on September 15-17, 2015. It is the fourth in a series of biodiversity informatics workshops held in fiscal year 2014-2015. The first three were 1)[https://www.idigbio.org/wiki/index.php/Data_Carpentry Data Carpentry], 2)[https://www.idigbio.org/wiki/index.php/Data_Sharing_Data_Standards_and_Demystifying_the_IPT Data Sharing Data Standards and Demystifying the IPT], and 3)[https://www.idigbio.org/wiki/index.php/Field_to_Database Field to Database (March 9 - 12, 2015)].
This wiki supports the Managing Natural History Collections (NHC) Data for Global Discoverability Workshop and is ''in development''. This workshop is sponsored by iDigBio and hosted by the Arizona State University (ASU) School of Life Sciences Natural History Collections, Informatics & Outreach Group in their new [https://www.flickr.com/photos/taxonbytes/15501490652/ Alameda space] on September 15-17, 2015. It is the fourth in a series of biodiversity informatics workshops held in fiscal year 2014-2015. The first three were
1) [https://www.idigbio.org/wiki/index.php/Data_Carpentry Data Carpentry],  
2) [https://www.idigbio.org/wiki/index.php/Data_Sharing_Data_Standards_and_Demystifying_the_IPT Data Sharing Data Standards and Demystifying the IPT], and  
3) [https://www.idigbio.org/wiki/index.php/Field_to_Database Field to Database (March 9 - 12, 2015)].


== General Information ==
== General Information ==
 
[[Image:Application2.png|right|500px|]]
'''Description and Overview of Workshop.''' Are you:
'''Description and Overview of Workshop.''' Are you:
*actively digitizing NHC data and looking to do it more efficiently?
*actively digitizing NHC data and looking to do it more efficiently?
Line 26: Line 31:
The theme of the "Collections Data for Global Discoverability" workshop is ideally suited for natural history collections specialists aiming to increase the "research readiness" of their biodiversity data at a global scale. Have you found yourself in situations where you need to manage larger quantities of collection records, or encounter challenges in carrying out updates or quality checks? Do you mainly use spreadsheets (such as Excel) to clean and manage specimen-level datasets before uploading them into your collections database? The workshop is most appropriate for those who are relatively new to collections data management and are motivated to provide the global research community with accessible, standards- and best practices-compliant biodiversity data.
The theme of the "Collections Data for Global Discoverability" workshop is ideally suited for natural history collections specialists aiming to increase the "research readiness" of their biodiversity data at a global scale. Have you found yourself in situations where you need to manage larger quantities of collection records, or encounter challenges in carrying out updates or quality checks? Do you mainly use spreadsheets (such as Excel) to clean and manage specimen-level datasets before uploading them into your collections database? The workshop is most appropriate for those who are relatively new to collections data management and are motivated to provide the global research community with accessible, standards- and best practices-compliant biodiversity data.


During the workshop essential information science and biodiversity data concepts will be introduced (i.e., data tables, data sharing, quality/cleaning, Darwin Core, APIs). Hands on data cleaning exercises using spreadsheet programs and readily usable and free software will be performed. The workshop is platform independent, and thus will not focus on the specifics of one or the other locally preferred biodiversity database platforms, instead addressing fundamental themes and solutions that will apply to a variety of database applications.
During the workshop essential information science and biodiversity data concepts will be introduced (i.e., data tables, data sharing, quality/cleaning, Darwin Core, APIs). Hands-on data cleaning exercises using spreadsheet programs and readily usable and free software will be performed. The workshop is platform independent, and thus will not focus on the specifics of one or the other locally preferred biodiversity database platforms, instead addressing fundamental themes and solutions that will apply to a variety of database applications.


<!-- We'll discuss and focus on the concepts, skills, and tools we need to share biodiversity occurrence data and related data such as genomics, and media. Datasets will be taken from organismal and evolutionary biology, biodiversity science, ecology, and environmental science. The workshop format includes lectures and hands-on work, so participants are required to bring their own laptops. We will provide information and instructions on any necessary software installations.-->
<!-- We'll discuss and focus on the concepts, skills, and tools we need to share biodiversity occurrence data and related data such as genomics, and media. Datasets will be taken from organismal and evolutionary biology, biodiversity science, ecology, and environmental science. The workshop format includes lectures and hands-on work, so participants are required to bring their own laptops. We will provide information and instructions on any necessary software installations.-->


To Do For You: Pre-reading materials [Darwin Core Data Standard, Best Practices for Data Management,...]
'''To Do For You:''' Pre-reading materials  
*[http://doi.org/10.1371/journal.pone.0029715 Darwin Core paper]


Updates will be posted to this website as they become available.
Updates will be posted to this website as they become available.
Line 39: Line 45:
===About===
===About===


'''Instructors (iDigBio):''' Katja Seltmann, Amber Budden, Edward Gilbert, Nico Franz, Mark Schildhauer, Greg Riccardi, Deborah Paul
'''Instructors (iDigBio):''' Katja Seltmann, Amber Budden, Edward Gilbert, Nico Franz, Greg Riccardi, Deborah Paul, Joanna McCaffrey, Kevin Love, Anne Thessen, David Bloom


'''Skill Level:''' We are focusing our efforts in this workshop on beginners.
'''Skill Level:''' We are focusing our efforts in this workshop on beginners.


'''Where and When:''' Tempe, AZ at the Arizona State University (ASU) School of Life Sciences Natural History Collections, Informatics & Outreach Group in their new [https://www.flickr.com/photos/taxonbytes/15501490652/ Alameda space], September 15 - 16, 2015
'''Where and When:''' Tempe, AZ at the Arizona State University (ASU) School of Life Sciences Natural History Collections, Informatics & Outreach Group in their new [https://www.flickr.com/photos/taxonbytes/15501490652/ Alameda space], September 15 - 17, 2015


'''Requirements:''' Participants must bring a laptop.
'''Requirements:''' Participants must bring a laptop.
Line 51: Line 57:
'''Twitter:'''  
'''Twitter:'''  


Tuition for the course is free, but there is an application process and spots are limited. [Apply here]
Tuition for the course is free, but there is an application process and spots are limited (and class is full).


===Software Installation Details===
===Software Installation Details===
Line 61: Line 67:


==Agenda==
==Agenda==
*Managing NHC Data Adobe Connect Room (to be linked - stay tuned)
*Managing NHC Data Adobe Connect Room http://idigbio.adobeconnect.com/nhcdata
*Monday evening, September 14th: pre-workshop informal get-together at [to be decided], from [time to be decided].
*Monday evening, September 14th: pre-workshop informal get-together at Vine Tavern and Eatery, 6 PM.


Schedule - subject to change.
Schedule - subject to change.
Line 69: Line 75:
!colspan="3"| Course Overview - Day 1 - Tuesday September 15th
!colspan="3"| Course Overview - Day 1 - Tuesday September 15th
|-
|-
|8:15-8:45
|8:15-8:30
|Check-in, name tags, log in, connect to wireless and Adobe Connect
|Check-in, name tags, log in, connect to wireless and Adobe Connect
|All
|All
|-
|-
|8:45-9:00
|8:30-9:15
|Welcome, Introductions, Logistics, Intro to the Workshop
|[http://www.idigbio.org/sites/default/files/workshop-presentations/managing-nhc-data/01_WelcomeWhyThisWorkshop.pptx Welcome, Logistics, Intro to the Workshop, Why Share Data? Why this workshop?]
|Deb Paul, iDigBio
[http://www.idigbio.org/sites/default/files/workshop-presentations/managing-nhc-data/TUE_0830_Session1_WhyThisWorkshop_Pt2_Budden-clean.pptx Why this Workshop?, part 2]
|-
:quick exercise - what are your data challenges? what software do you use?
|9:00-9:15
:key point - why share data?
|Why this workshop?
|Deb Paul, Amber Budden
:quick exercise - what are your data challenges? what software do you use? how do you track / document your procedures?
|Amber Budden & Deb Paul
|-
|-
|09:15-9:35
|09:15-9:35
|General Concepts and Best Practices<br/>
|[http://www.idigbio.org/sites/default/files/workshop-presentations/managing-nhc-data/TUE_0915_Session2_ConceptsBestPractices_Budden-clean.pptx General Concepts and Best Practices]
:brief introduction to data modeling, the data life-cycle, and relational databases
:the data life-cycle, brief introduction to data modeling, and relational databases
|Ed Gilbert and Amber Budden
|Ed Gilbert and Amber Budden
|-
|-
|9:35-9:55
|9:35-9:55
|Overview of Data standards<br/>
|[http://www.idigbio.org/sites/default/files/workshop-presentations/managing-nhc-data/03_OverviewOfDataStandards.pptx Overview of Data standards]
:Darwin Core, EML, Audubon Core, GGBN, DwC-A, Identifiers (GUIDs vs local)
:Darwin Core, EML, Audubon Core, GGBN, DwC-A, Identifiers (GUIDs vs local)
|Ed Gilbert, Deb Paul
|Ed Gilbert, Deb Paul
|-
|-
|10:00-10:30
|10:00-10:30
|Hands-on Exercise with Specimen Data Set<br/>
|Introduction to Mapping Data
:hands-on exercise with occurrence specimen data set
:data set with known mapping / standardization issues.
:data set with known mapping / standardization issues.
:[http://rs.tdwg.org/dwc/terms/index.htm Darwin Core Terms]
:[https://drive.google.com/file/d/0B0Rlroh4mbthTUFEYTVUU2hZNjQ/view?usp=sharing Sample Data]
:[http://www.idigbio.org/sites/default/files/workshop-presentations/managing-nhc-data/SampleDataSetIssues_WorkshopVersion.docx Known Issues in Sample Data]
|All
|All
|-
|-
| style="background-color: #eee;" | 10:30-10:50
| style="background-color: #eee;" | 10:30-10:50
| style="background-color: #eee;" | Break
| style="background-color: #eee;" | Break
| style="background-color: #eee;" | all
| style="background-color: #eee;" |  
|-
|-
|10:50-11:30
|10:50-11:30
|Data Management Planning<br/>
|Data Management Planning
:choosing a database, data flow, data backup, field-to-database, metadata
:[https://www.idigbio.org/sites/default/files/workshop-presentations/managing-nhc-data/JMc_Tempe_CMSConsiderations.pptx choosing a collection management system], data flow, data backup, field-to-database, metadata
|Amber Budden
|Amber Budden and Joanna McCaffrey
|-
|-
|11:30-12:00
|11:30-12:00
|Exercise DataONE Lesson 4: best practices for data entry and data manipulation
|DataONE Lesson 4
:[http://www.idigbio.org/sites/default/files/workshop-presentations/managing-nhc-data/TUE_1130_Session6_DataEntry_BestPractices_Budden-clean.pptx best practices for data entry and data manipulation]
|Amber Budden
|Amber Budden
|-
|-
Line 115: Line 124:
|-
|-
|1:00-1:30
|1:00-1:30
|Images and media issues: a brief intro<br/>
|Images and media issues: a brief intro
:choosing a camera, issues across different database platforms, image submissions, linking images to occurrence records, batch processing
:[https://www.idigbio.org/sites/default/files/workshop-presentations/managing-nhc-data/JMc_Photography101_Tempe.pptx choosing a camera], issues across different database platforms, image submissions, linking images to occurrence records, batch processing, dams
|Ed Gilbert
|Ed Gilbert and Joanna McCaffrey
|-
|1:30-1:50
|Digitization workflows and process<br/>
:getting started, prioritization, specimen collecting, new database, integrating old data
|Deb Paul, Ed Gilbert & Katja Seltmann
|-
|1:50-2:10
|Common Workflows<br/>
:image to data, specimen to data, skeletal records, crowd-sourcing, OCR/NLP, georeferencing, metadata
|Deb Paul, Ed Gilbert & Katja Seltmann
|-
|-
|2:10-2:25
|1:30-2:00
|Optimization:  
|[http://www.idigbio.org/sites/default/files/workshop-presentations/managing-nhc-data/07_CommonWorkflows.pptx Digitization workflows and process: Common Workflows and Optimization]
:getting started, prioritization, specimen collecting, new database, and integrating old data.
:Image to data, specimen to data, to-the-web and skeletal records.
:Reviewing your own workflow, common bottlenecks, policy, documentation  
:Reviewing your own workflow, common bottlenecks, policy, documentation  
|Katja Seltmann, Deb Paul & Ed Gilbert
|Katja Seltmann, Deb Paul & Ed Gilbert
|-
|-
|2:25-3:00
|2:00 - 3:00
|Hands-on exercise (to be decided)
|Collections Tours and Symbiota Demo. (groups of 10)
|tbd
:Digitization in Action: Insects, Botany, Symbiota
|All
|-
|-
| style="background-color: #eee;" | 3:00-3:20
| style="background-color: #eee;" | 3:00-3:20
Line 143: Line 145:
|-
|-
|3:20-3:50
|3:20-3:50
|Georeferencing Data (Georeferencing Workflow)<br/>
|Georeferencing Data (Georeferencing Workflow)
:visualization tools, when to georeference, best practices
:visualization tools, when to georeference, best practices (the import of standards): error uncertainty, georeferencingRemarks
|Ed Gilbert
|Ed Gilbert
|-
|-
|3:50-4:10
|3:50-4:10
|GEOLocate Exercise (May be DEMO)<br/>
|GEOLocate Exercise (May be DEMO)
:CoGe, GPS Visualizer, re-integration, qc
:CoGe, GPS Visualizer, re-integration, qc
:Folks can preregister to GEOLocate Collaborative Georeferencing using the link below. Doing so will automatically register them for the Phoenix community project that Ed created. If you already have a login, you can use the link to just register ypur existing account to the Phoenix project.
::http://www.museum.tulane.edu/coge/WebComEasySignUp.aspx?ajc=915E2056
|Ed Gilbert
|Ed Gilbert
|-
|-
|4:40-5:30
|4:40-5:30
|Conversation, overview of day, '''volunteers''', preview for tomorrow...
|Conversation, overview of day, preview for tomorrow, backpack logistics for tomorrow, ...
|All
|All
|-
|-
|(Optional Evening Activity?)
|
|
|
|-
|-
Line 162: Line 166:
|-
|-
|8:30-12:00
|8:30-12:00
|[http://www.dbg.org/ Desert Botanical Garden (DBG) Field Trip] and Lunch<br/>meet at 8:30 in Hotel Lobby, depart at 8:40 for DBG; garden from 9-11:30, lunch 11:30 - 12:30, depart 12:40 to ASU
|[http://www.dbg.org/ Desert Botanical Garden (DBG) Field Trip] and Lunch
:meet at 7:55 in Hotel Lobby, depart at 8:00 and 8:30 for DBG; garden from 9-11:30, lunch 11:30 - 12:30, aim to depart 12:00 and 12:30 to ASU. Bring a hat!
|  
|  
|-
|-
| style="background-color: #eee;" |12:00-1:00
| style="background-color: #eee;" |11:30-12:30
| style="background-color: #eee;" | Lunch at Gertrude's (in the Garden)
| style="background-color: #eee;" | Lunch at Gertrude's (in the Garden) YUM!
| style="background-color: #eee;" |  
| style="background-color: #eee;" |  
|-
|-
|1:00-1:25
|2:00-2:35
|Welcome Back and Intro to Data Quality<br/>
|Welcome Back and Intro to Data Quality
:inside the data-life-cycle, cost of data quality, quality vs completeness
:inside the data-life-cycle, cost of data quality, quality vs completeness
|Amber Budden, Ed Gilbert
|Amber Budden, Greg Riccardi, (Ed Gilbert)
|-
|-
|1:25-1:40
|2:35-2:45
|Data Cleaning<br/>
|[https://www.idigbio.org/sites/default/files/workshop-presentations/managing-nhc-data/02_ReviewTools.ppt Review Tools for Data Cleaning, Data Manipulation, and Visualization] (and Lessons)
:Spreadsheets, Kurator, GPS Visualizer, GEOLOcate, CoGE, Google Maps, CartoDB, Google Fusion Tables, Notepad ++, Open Refine, BioVel, Access,(others), iDigBio recordset data cleaning, iPlant TNRS, RegEx
:Where do they fit in your workflow?
|Deb Paul
|-
|2:45-2:50
|[https://www.idigbio.org/sites/default/files/workshop-presentations/managing-nhc-data/03_DataCleaningWorkflows145PM-Wednesday.ppt Data Cleaning]
:where, when and how does it happen?, what kind of feedback to expect
:where, when and how does it happen?, what kind of feedback to expect
:types of common errors and omissions, best practices strategies, feedback and annotation, error tracking, automation, policies and protocols  
:types of common errors and omissions, best practices strategies, feedback and annotation, error tracking, automation, policies and protocols  
|Deb Paul & Katja Seltmann
|Deb Paul & Katja Seltmann
|-
|-
|1:40-2:20
|2:50-3:40
|Data Cleaning Exercise I<br/>
|Data Cleaning Exercise I
:(opt: quick exercise - spot the snafus)
:better spreadsheet skills (Data Carpentry)
:better spreadsheet skills (Data Carpentry)
|Deb Paul & Katja Seltmann
:http://idigbio.github.io/spreadsheet-skills/00-intro.html
|-
|Katja Seltmann & Deb Paul
|2:20-2:50
|Data Cleaning Exercise II<br/>
:Open Refine, part I (facets, clustering)
|Deb Paul & Katja Seltmann
|-
|-
| style="background-color: #eee;" | 2:50-3:10
| style="background-color: #eee;" | 3:40-4:00
| style="background-color: #eee;" | Break
| style="background-color: #eee;" | Break
| style="background-color: #eee;" |
| style="background-color: #eee;" |
|-
|-
|3:10 - 4:40
|4:00 - 5:00
|time for discussion / break outs / unconference topics or demos
|Data Cleaning Exercise I
:better spreadsheet skills (Data Carpentry), continued...
|Katja Seltmann & Deb Paul
|-
|5:00-5:15
|Data Cleaning Exercise II
:Open Refine, part I (facets, clustering)
:https://idigbio.github.io/open-refine/00-getting-started.html
:https://wiki.biovel.eu/display/doc/Installing+and+running+DR+Workflow+on+Taverna+Workbench#InstallingandrunningDRWorkflowonTavernaWorkbench-InstallingGoogleRefine
:http://multimedia.journalism.berkeley.edu/tutorials/google-refine-export-json/
|Deb Paul
|-
|5:15-5:30
|Conversation, overview of day for context and questions, '''homework''' and preview for tomorrow...
|Deb Paul & Katja Seltmann
|Deb Paul & Katja Seltmann
|-
|-
|4:40-5:00
|Evening Activity (opt)
|Conversation, overview of day for context and questions, preview for tomorrow...
|Insect Collecting Opportunity
|Deb Paul & Katja Seltmann
:Sign Up and Details - [http://asu-entomology.wikispaces.com/Fall+2015+Collecting Wednesday night insect collecting trip to Mesquite Wash] <br/> Pictures Please!
|Host - Nico Franz
|-
|-
!colspan="3"| Course Overview - Day 3 - Thursday September 17th
!colspan="3"| Course Overview - Day 3 - Thursday September 17th
|-
|-
|8:45-9:00
|8:30-9:00
|Discussion of Material Covered so far and Overview of Day 3
|Discussion of Material Covered so far, Overview of Day 3, Set up breakout groups
|Katja Seltmann
|Katja Seltmann
|-
|-
|9:00-9:30
|9:00-10:00
|Review Tools for Data Cleaning, Data Manipulation, and Visualization (and Lessons)
|Potential break out groups
:Kurator, GPS Visualizer, GEOLOcate, Google Fusion Tables, Notepad ++, Open Refine, Access,(others)
:Taxonomic Names issues - TNRS,ECAT
:Where do they fit in your workflow?
:GEOLocate, CoGe, Georeferencing Workflows, Workshops
|Deb Paul & Katja Seltmann
:Data Cleaning: what is scripting? what is regex? examples in Open Refine, possibly in Symbiota
:your own data issues / requests
:Refine, part II (Using APIs, Taxonomic Name Resolution Services)
:More about choosing software / and the "build-your-own" scenario
:More about identifiers
:More on imaging issues (what camera to purchase, etc)​
:On OCR, NLP, duplicate harvesting
:DataONE Data Management Planning Tool
:What is Data Carpentry?
:Text Editors
:rAPI
|All
|-
| style="background-color: #eee;" | '''10:00-10:35'''
| style="background-color: #eee;" | Break
| style="background-color: #eee;" |  
|-
|-
|30min
|10:35 - 10:55
|Identifiers
|Sharing Data: Preparing and Moving Data to the Internet
:making data useful, understandable in the outside world, properties, values and being systematic
|Greg Riccardi
|Greg Riccardi
|-
|-
| style="background-color: #eee;" | 20min
|10:55-11:20
| style="background-color: #eee;" | Break
|Data Publishing: in the context of the data life cycle
| style="background-color: #eee;" |
:benefits, concerns, aggregators, citation, attribution
:VertNet [http://vertnet.org/resources/norms.html Norms For Data Use and Publication]
|Anne Thessen, http://datadetektiv.com/
|-
|-
|1hr 20min
|11:20-11:40
|Break out groups<br/>TNRS,ECAT,QGIS,GEOLocate,CoGe,Data Cleaning: what is scripting? what is regex? examples in Open Refine, possibly in Symbiota, your own data issues / requests,Data Cleaning Exercise II - using Open Refine, part II (Using APIs, Taxonomic Name Resolution Services)
|[https://www.idigbio.org/sites/default/files/workshop-presentations/managing-nhc-data/JMc_Tempe_DataToiDigBio.pptx Getting Your Data Published: Sending Data to iDigBio]
|All
:from you to us, the details, the options
|Joanna McCaffrey
|-
|-
| style="background-color: #eee;" |12:00-1:00
| style="background-color: #eee;" |12:00-1:00
Line 231: Line 270:
| style="background-color: #eee;" |  
| style="background-color: #eee;" |  
|-
|-
|1:00-1:25
|1:00-1:45
|Data Publishing: in the context of the data life cycle<br/>benefits, concerns, aggregators, citation, attribution
|Feedback from iDigBio as part of the Data Life Cycle and an iDigBio Portal Exercise
| tbd
:[[Media:Recordset-cleaning.pdf| iDigBio Data Management and Recordset Data Quality]]
:[https://www.idigbio.org/content/improving-data-quality-idigbio-recordset-data-cleaning-method-tools-and-data-flags Webinar coming up - Improving Data Quality: iDigBio Recordset data cleaning method, tools, and data flags] October 23th, 2015
:Using the iDigBio Portal and integrated research tools (PhyloJive, LifeMapper)
:https://goo.gl/gyRwx7
:http://idigbio.github.io/spreadsheet-skills/09-iDigBio-portal.html
|Kevin Love, Katja Seltmann and Deb Paul
|-
|-
|1:30-2:15
|1:45-2:05
|iDigBio Portal Exercise: Using iDigBio portal to do something with data that can’t be done within a local system, Ex. PhyloJive
|[https://www.idigbio.org/sites/default/files/workshop-presentations/managing-nhc-data/Law%26Ethics_DataDiscovery_iDigBio.pptx Copyright / Intellectual Property]
|Deb Paul & Katja Seltmann
:VertNet [http://vertnet.org/resources/datalicensingguide.html Guide to Copyright and Licenses for Dataset Publication]
|-
::[http://vertnet.org/resources/norms.html VertNet Norms]
|2:15-2:45
:[https://www.idigbio.org/content/idigbio-terms-use-policy iDigBio Terms of Use and Citation]
|Copyright / Intellectual Property
|David Bloom, Jonathan Rees, Greg Riccardi
|tbd
|-
|-
| style="background-color: #eee;" | 3:00-3:20
| style="background-color: #eee;" | 3:00-3:20
Line 247: Line 290:
| style="background-color: #eee;" |
| style="background-color: #eee;" |
|-
|-
|3:20-4:20
|3:20-5:00
|Second round of break-out groups<br/>DWC-A publishing Exercise (or DEMO): using IPT instance OR Symbiota DwC-A mapping and publishing exercise
|Second round of break-out groups
|
:DWC-A publishing Exercise (or DEMO): using IPT instance
|-
::[https://www.idigbio.org/sites/default/files/workshop-presentations/managing-nhc-data/sample-data/sampleoccurrence_dupfixed.txt Sample Dataset]
|4:20-4:40
::your email and "password"
|Closing topics<br/>a greater network, the global landscape, next steps
:::http://iptworkshop.idigbio.org/ (your email prefix)
|Katja Seltmann & Nico Franz
:Symbiota DwC-A mapping and publishing exercise,
:others
|Edward Gilbert
|-
|-
|4:40-5:10
|5:00 -5:30
|Participant 3 minute Presentations (1 slide)
|Closing topics
|
:What are your next steps for moving forward
:guided discussion, survey, and thanks!, ...
|Katja Seltmann & all
|-
|-
|5:10 - 5:30
|Review Data Life Cycle we’ve walked through.<br/>discussion, survey, next steps, and conclusions
|all
|}
|}


==Logistics==
==Logistics==
*link to local area activities / restaurants
*[[Media:MapToManagingNHCdataWorkshopAndLodging.pdf|Map showing Hotel and Workshop Locations]] (pdf)
*logistics for hotel / food / per diem / map
*[[Media:logistics_phoenix_V2.pdf|Logistics for hotel / per diem / contacts / transportation]] (pdf)
*[[Media:restaurants.pdf|Some restaurants near the Hotel 1333]] (pdf)
**[[Media:restaurantlist.pdf|List of Restaurants]] (pdf)
*[https://www.idigbio.org/content/managing-natural-history-collections-data-global-discoverability Workshop Calendar Announcement]
*[https://www.idigbio.org/content/managing-natural-history-collections-data-global-discoverability Workshop Calendar Announcement]
*Paticipant List
*[https://docs.google.com/spreadsheets/d/1P-8arv0aEG5Koo3uqSywY24t_L0mUXDsQ1mik0e7q10/edit?usp=sharing Participant List]


===Adobe Connect Access===
===Adobe Connect Access===
Adobe Connect will be used to provide access for everyone and for remote folks to listen to the lectures.
Adobe Connect will be used to provide access for everyone and for remote folks to listen to the lectures.
*[http://idigbio.adobeconnect.com/nhcdata Adobe Connect for Remote Listening]


==Workshop Documents, Presentations, and Links==
==Workshop Documents, Presentations, and Links==
*Google Collaborative Notes
*[https://docs.google.com/document/d/1uVKwl7BR_G_5iIIazHHW0YLPVSgWqAftWu_F5c7WCpA/edit Google Collaborative Notes]
**These are notes with benefits.
*links to any presentations (like power points) here
*links to any presentations (like power points) here
*[http://rs.tdwg.org/dwc/terms/ Darwin Core Terms]
*[http://rs.tdwg.org/dwc/terms/ Darwin Core Terms]
Line 291: Line 339:
==Workshop Recordings==
==Workshop Recordings==
====Day 1====
====Day 1====
*8:30am-10:15m
*8:30am-10:15m http://idigbio.adobeconnect.com/p1w43drdjqp/
*10:45am-11:00am
*1:00pm-2pm http://idigbio.adobeconnect.com/p2jibddghos/
*11:15am-12pm
*3:30-5:30pm http://idigbio.adobeconnect.com/p3rgo6o79wk/
*1:00pm-2:30pm
*3:00-5:00pm


====Day 2 ====
====Day 2 ====
*8:30am-10:15m
*2:00pm-4pm http://idigbio.adobeconnect.com/p88qwbfsh33/
*10:45am-11:00am
*4-5:30pm http://idigbio.adobeconnect.com/p6956srj6iu/
*11:15am-12pm
*1:00pm-2:30pm
*3:00-5:00pm


====Day 3 ====
====Day 3 ====
*8:30am-10:15m
*10:35am-12:00pm http://idigbio.adobeconnect.com/p38bkyk8uge/
*10:45am-11:00am
*1:00pm-3:30pm http://idigbio.adobeconnect.com/p4fkezirk1j/
*11:15am-12pm
*3:30-5:00pm http://idigbio.adobeconnect.com/p8slidpxlc7/
*1:00pm-3:30pm
*3:30-5:00pm


==Resources and Links==
==Resources and Links==
Line 328: Line 369:
**For example see [http://www.lynda.com/Access-tutorials/Relational-Database-Fundamentals/145932-2.html relational database fundamentals]
**For example see [http://www.lynda.com/Access-tutorials/Relational-Database-Fundamentals/145932-2.html relational database fundamentals]
*You want to share genetic sequence data for your specimens? Are the sequences in a database like GenBank? You can use [http://rs.tdwg.org/dwc/terms/#associatedSequences dwc:associatedSequences field] to share links to the sequences and metadata about them. Note you can soon use the Material Sample Core, and share more complex genomic data using the GGBN extensions, and also use an extension to share the specimen information from which the samples were taken.
*You want to share genetic sequence data for your specimens? Are the sequences in a database like GenBank? You can use [http://rs.tdwg.org/dwc/terms/#associatedSequences dwc:associatedSequences field] to share links to the sequences and metadata about them. Note you can soon use the Material Sample Core, and share more complex genomic data using the GGBN extensions, and also use an extension to share the specimen information from which the samples were taken.
*[https://www.idigbio.org/wiki/images/2/20/Digitization_info_from_the_INHS.pdf INHS digitization system shopping list + info on setup]
*Want to learn SQL?
**http://sqlschool.modeanalytics.com
**http://www.headfirstlabs.com/books/hfsql/
*Teaching Reproducible Research best practices
**[http://reproducible-science-curriculum.github.io/2015-06-01-reproducible-science-idigbio/#schedule Reproducible Research Workshop]
*STEM - What can you do to help?
**Read! [http://www.aauw.org/research/why-so-few/ Why So Few?]


==[[Digitization Training Workshops|Digitization Training Workshops Wiki Home]]==
==[[Digitization Training Workshops|Digitization Training Workshops Wiki Home]]==

Latest revision as of 22:25, 21 September 2015

Managing Natural History Collections Data for Global Discoverability
Capture1.JPG

Quick Links for Managing NHC Data for Global Discoverability wiki
Managing NHC Data Announcement
Managing NHC Data for Global Discoverability - Agenda
Managing NHC Data for Global Discoverability Biblio Entries
Managing NHC Data for Global Discoverability Report

This wiki supports the Managing Natural History Collections (NHC) Data for Global Discoverability Workshop and is in development. This workshop is sponsored by iDigBio and hosted by the Arizona State University (ASU) School of Life Sciences Natural History Collections, Informatics & Outreach Group in their new Alameda space on September 15-17, 2015. It is the fourth in a series of biodiversity informatics workshops held in fiscal year 2014-2015. The first three were 1) Data Carpentry, 2) Data Sharing Data Standards and Demystifying the IPT, and 3) Field to Database (March 9 - 12, 2015).

General Information

Application2.png

Description and Overview of Workshop. Are you:

  • actively digitizing NHC data and looking to do it more efficiently?
  • getting ready to start digitizing NHC data and looking to learn some new skills to enhance your workflow?
  • digitizing someone else’s specimens (e.g., as part of a research project)?
  • finding yourself in the role of the museum database manager (even though it may not be your title or original job)?
  • someone who has a private research collection who wishes to donate specimens and data to a public collection?

The theme of the "Collections Data for Global Discoverability" workshop is ideally suited for natural history collections specialists aiming to increase the "research readiness" of their biodiversity data at a global scale. Have you found yourself in situations where you need to manage larger quantities of collection records, or encounter challenges in carrying out updates or quality checks? Do you mainly use spreadsheets (such as Excel) to clean and manage specimen-level datasets before uploading them into your collections database? The workshop is most appropriate for those who are relatively new to collections data management and are motivated to provide the global research community with accessible, standards- and best practices-compliant biodiversity data.

During the workshop essential information science and biodiversity data concepts will be introduced (i.e., data tables, data sharing, quality/cleaning, Darwin Core, APIs). Hands-on data cleaning exercises using spreadsheet programs and readily usable and free software will be performed. The workshop is platform independent, and thus will not focus on the specifics of one or the other locally preferred biodiversity database platforms, instead addressing fundamental themes and solutions that will apply to a variety of database applications.


To Do For You: Pre-reading materials

Updates will be posted to this website as they become available.

Planning Team

Collaboratively brought to you by: Katja Seltmann (AMNH - TTD-TCN), Amber Budden (DataONE), Edward Gilbert (ASU - Symbiota), Nico Franz (ASU), Mark Schildhauer (NCEAS), Greg Riccardi (FSU - iDigBio), Reed Beaman (NSF), Cathy Bester (iDigBio), Shari Ellis (iDigBio), Kevin Love (iDigBio), Deborah Paul (FSU - iDigBio)

About

Instructors (iDigBio): Katja Seltmann, Amber Budden, Edward Gilbert, Nico Franz, Greg Riccardi, Deborah Paul, Joanna McCaffrey, Kevin Love, Anne Thessen, David Bloom

Skill Level: We are focusing our efforts in this workshop on beginners.

Where and When: Tempe, AZ at the Arizona State University (ASU) School of Life Sciences Natural History Collections, Informatics & Outreach Group in their new Alameda space, September 15 - 17, 2015

Requirements: Participants must bring a laptop.

Contact (iDigBio Participants): Please email Deb Paul dpaul@fsu.edu for questions and information not covered here.

Twitter:

Tuition for the course is free, but there is an application process and spots are limited (and class is full).

Software Installation Details

A laptop and a web browser are required for participants.
We use Adobe Connect extensively in this workshop. Please perform the systems test using the link below. Also, you will also need to install the Adobe Connect Add-In to participate in the workshop.

  • Adobe Connect Systems Test
    • Note when you follow the link to install and perform the test, some software will install (but it doesn't look like anything happens). To check, simply re-run the test.

Agenda

Schedule - subject to change.

Course Overview - Day 1 - Tuesday September 15th
8:15-8:30 Check-in, name tags, log in, connect to wireless and Adobe Connect All
8:30-9:15 Welcome, Logistics, Intro to the Workshop, Why Share Data? Why this workshop?

Why this Workshop?, part 2

quick exercise - what are your data challenges? what software do you use?
key point - why share data?
Deb Paul, Amber Budden
09:15-9:35 General Concepts and Best Practices
the data life-cycle, brief introduction to data modeling, and relational databases
Ed Gilbert and Amber Budden
9:35-9:55 Overview of Data standards
Darwin Core, EML, Audubon Core, GGBN, DwC-A, Identifiers (GUIDs vs local)
Ed Gilbert, Deb Paul
10:00-10:30 Introduction to Mapping Data
hands-on exercise with occurrence specimen data set
data set with known mapping / standardization issues.
Darwin Core Terms
Sample Data
Known Issues in Sample Data
All
10:30-10:50 Break
10:50-11:30 Data Management Planning
choosing a collection management system, data flow, data backup, field-to-database, metadata
Amber Budden and Joanna McCaffrey
11:30-12:00 DataONE Lesson 4
best practices for data entry and data manipulation
Amber Budden
12:00-1:00 Lunch (Provided by Panera)
1:00-1:30 Images and media issues: a brief intro
choosing a camera, issues across different database platforms, image submissions, linking images to occurrence records, batch processing, dams
Ed Gilbert and Joanna McCaffrey
1:30-2:00 Digitization workflows and process: Common Workflows and Optimization
getting started, prioritization, specimen collecting, new database, and integrating old data.
Image to data, specimen to data, to-the-web and skeletal records.
Reviewing your own workflow, common bottlenecks, policy, documentation
Katja Seltmann, Deb Paul & Ed Gilbert
2:00 - 3:00 Collections Tours and Symbiota Demo. (groups of 10)
Digitization in Action: Insects, Botany, Symbiota
All
3:00-3:20 Break
3:20-3:50 Georeferencing Data (Georeferencing Workflow)
visualization tools, when to georeference, best practices (the import of standards): error uncertainty, georeferencingRemarks
Ed Gilbert
3:50-4:10 GEOLocate Exercise (May be DEMO)
CoGe, GPS Visualizer, re-integration, qc
Folks can preregister to GEOLocate Collaborative Georeferencing using the link below. Doing so will automatically register them for the Phoenix community project that Ed created. If you already have a login, you can use the link to just register ypur existing account to the Phoenix project.
http://www.museum.tulane.edu/coge/WebComEasySignUp.aspx?ajc=915E2056
Ed Gilbert
4:40-5:30 Conversation, overview of day, preview for tomorrow, backpack logistics for tomorrow, ... All
Course Overview - Day 2 - Wednesday September 16th
8:30-12:00 Desert Botanical Garden (DBG) Field Trip and Lunch
meet at 7:55 in Hotel Lobby, depart at 8:00 and 8:30 for DBG; garden from 9-11:30, lunch 11:30 - 12:30, aim to depart 12:00 and 12:30 to ASU. Bring a hat!
11:30-12:30 Lunch at Gertrude's (in the Garden) YUM!
2:00-2:35 Welcome Back and Intro to Data Quality
inside the data-life-cycle, cost of data quality, quality vs completeness
Amber Budden, Greg Riccardi, (Ed Gilbert)
2:35-2:45 Review Tools for Data Cleaning, Data Manipulation, and Visualization (and Lessons)
Spreadsheets, Kurator, GPS Visualizer, GEOLOcate, CoGE, Google Maps, CartoDB, Google Fusion Tables, Notepad ++, Open Refine, BioVel, Access,(others), iDigBio recordset data cleaning, iPlant TNRS, RegEx
Where do they fit in your workflow?
Deb Paul
2:45-2:50 Data Cleaning
where, when and how does it happen?, what kind of feedback to expect
types of common errors and omissions, best practices strategies, feedback and annotation, error tracking, automation, policies and protocols
Deb Paul & Katja Seltmann
2:50-3:40 Data Cleaning Exercise I
better spreadsheet skills (Data Carpentry)
http://idigbio.github.io/spreadsheet-skills/00-intro.html
Katja Seltmann & Deb Paul
3:40-4:00 Break
4:00 - 5:00 Data Cleaning Exercise I
better spreadsheet skills (Data Carpentry), continued...
Katja Seltmann & Deb Paul
5:00-5:15 Data Cleaning Exercise II
Open Refine, part I (facets, clustering)
https://idigbio.github.io/open-refine/00-getting-started.html
https://wiki.biovel.eu/display/doc/Installing+and+running+DR+Workflow+on+Taverna+Workbench#InstallingandrunningDRWorkflowonTavernaWorkbench-InstallingGoogleRefine
http://multimedia.journalism.berkeley.edu/tutorials/google-refine-export-json/
Deb Paul
5:15-5:30 Conversation, overview of day for context and questions, homework and preview for tomorrow... Deb Paul & Katja Seltmann
Evening Activity (opt) Insect Collecting Opportunity
Sign Up and Details - Wednesday night insect collecting trip to Mesquite Wash
Pictures Please!
Host - Nico Franz
Course Overview - Day 3 - Thursday September 17th
8:30-9:00 Discussion of Material Covered so far, Overview of Day 3, Set up breakout groups Katja Seltmann
9:00-10:00 Potential break out groups
Taxonomic Names issues - TNRS,ECAT
GEOLocate, CoGe, Georeferencing Workflows, Workshops
Data Cleaning: what is scripting? what is regex? examples in Open Refine, possibly in Symbiota
your own data issues / requests
Refine, part II (Using APIs, Taxonomic Name Resolution Services)
More about choosing software / and the "build-your-own" scenario
More about identifiers
More on imaging issues (what camera to purchase, etc)​
On OCR, NLP, duplicate harvesting
DataONE Data Management Planning Tool
What is Data Carpentry?
Text Editors
rAPI
All
10:00-10:35 Break
10:35 - 10:55 Sharing Data: Preparing and Moving Data to the Internet
making data useful, understandable in the outside world, properties, values and being systematic
Greg Riccardi
10:55-11:20 Data Publishing: in the context of the data life cycle
benefits, concerns, aggregators, citation, attribution
VertNet Norms For Data Use and Publication
Anne Thessen, http://datadetektiv.com/
11:20-11:40 Getting Your Data Published: Sending Data to iDigBio
from you to us, the details, the options
Joanna McCaffrey
12:00-1:00 Lunch (Provided by Panera)
1:00-1:45 Feedback from iDigBio as part of the Data Life Cycle and an iDigBio Portal Exercise
iDigBio Data Management and Recordset Data Quality
Webinar coming up - Improving Data Quality: iDigBio Recordset data cleaning method, tools, and data flags October 23th, 2015
Using the iDigBio Portal and integrated research tools (PhyloJive, LifeMapper)
https://goo.gl/gyRwx7
http://idigbio.github.io/spreadsheet-skills/09-iDigBio-portal.html
Kevin Love, Katja Seltmann and Deb Paul
1:45-2:05 Copyright / Intellectual Property
VertNet Guide to Copyright and Licenses for Dataset Publication
VertNet Norms
iDigBio Terms of Use and Citation
David Bloom, Jonathan Rees, Greg Riccardi
3:00-3:20 Break
3:20-5:00 Second round of break-out groups
DWC-A publishing Exercise (or DEMO): using IPT instance
Sample Dataset
your email and "password"
http://iptworkshop.idigbio.org/ (your email prefix)
Symbiota DwC-A mapping and publishing exercise,
others
Edward Gilbert
5:00 -5:30 Closing topics
What are your next steps for moving forward
guided discussion, survey, and thanks!, ...
Katja Seltmann & all

Logistics

Adobe Connect Access

Adobe Connect will be used to provide access for everyone and for remote folks to listen to the lectures.

Workshop Documents, Presentations, and Links

Pre-Workshop Reading List

Links beneficial for review

Workshop Recordings

Day 1

Day 2

Day 3

Resources and Links

Digitization Training Workshops Wiki Home