Data Carpentry: Difference between revisions

From iDigBio
Jump to navigation Jump to search
 
(67 intermediate revisions by 7 users not shown)
Line 1: Line 1:
[[Category: Data carpentry]][[Category: Biodiversity informatics]]
{| class="wikitable" style="float:right; margin-left: 10px;"
This wiki supports the Data Carpentry Workshop to be held at the University of Florida at iDigBio September 29-30, 2014. It is the first in a series of four biodiversity informatics workshops planned in the upcoming year (2014-2015).
! colspan="2" style="background:#D58B28;width:200px;font-size:10pt" |Data Carpentry
==[[Digitization Training Workshops|Digitization Training Workshops Wiki Home]]==
|-
 
| colspan="2" style="text-align:center;font-size:7pt" |<!--YOU CAN INSERT A NEW IMAGE FOR THE LOGO BETWEEN THE COLON AND THE PIPE--> [[Image:ReproducibleWorkflows.png|center|400px|Image:ReproducibleWorkflows.png]]<br />
== Planning Team ==
|-
 
!colspan="2" style="background:#D58B28;text-align:center;font-size:9pt" | Quick Links for Data Carpentry
[mailto:francois.michonneau@gmail.com François Michonneau] (FLMNH - iDigBio), Katja Seltmann (TTD-TCN, MNH), Matt Collins (ACIS - iDigBio), Dan Stoner (ACIS - iDigBio), [mailto:dpaul@fsu.edu Deborah Paul] (FSU - iDigBio), Tracy K. Teal (BEACON), Pam Soltis (FLMNH - iDigBio PI), Derek Masaki (USGS - BISON), Shari Ellis (iDigBio), Kevin Love (iDigBio), Mike Smorul (SESYNC), Juliet Pulliam (UF), Ming Tang (Tommy) (UF), and assistance from Nirav Merchant at iPlant.
|-
|[https://www.idigbio.org/wiki/index.php/Data_Carpentry#Agenda Data Carpentry Agenda]
|-
|[https://www.idigbio.org/biblio?f%5bkeyword%5d=436 Data Carpentry Biblio Entries]
|-
|[https://www.idigbio.org/wiki/index.php/Data_Carpentry#Link_to_Workshop_Report Data Carpentry Report]
|}
[[Category:Workshop]][[Category: Data carpentry]][[Category: Biodiversity informatics]]
This wiki supports the Data Carpentry Workshop held simultaneously at the University of Florida at iDigBio and AMNH on September 29-30, 2014. It is the first in a series of four biodiversity informatics workshops planned in the upcoming year (2014-2015) for iDigBio. The next workshop in this series is [https://www.idigbio.org/wiki/index.php/Data_Sharing_Data_Standards_and_Demystifying_the_IPT Data Sharing, Data Standards, and Demystifying the IPT] (January 13-14, 2015).


== General Information ==
== General Information ==
Line 28: Line 36:
http://datacarpentry.github.io/2014-09-29-iDigBio/
http://datacarpentry.github.io/2014-09-29-iDigBio/


==About==
=== Planning Team ===
 
[mailto:francois.michonneau@gmail.com François Michonneau] (FLMNH - iDigBio), Katja Seltmann (TTD-TCN, MNH), Matt Collins (ACIS - iDigBio), Dan Stoner (ACIS - iDigBio), [mailto:dpaul@fsu.edu Deborah Paul] (FSU - iDigBio), Tracy K. Teal (BEACON), Pam Soltis (FLMNH - iDigBio PI), Derek Masaki (USGS - BISON), Shari Ellis (iDigBio), Kevin Love (iDigBio), Mike Smorul (SESYNC), Juliet Pulliam (UF), Ming Tang (Tommy) (UF), and assistance from Nirav Merchant at iPlant.
 
===About===


'''Instructors:''' François Michonneau (FLMNH - iDigBio), Tracy Teal (MSU - BEACON), Matt Collins (ACIS - iDigBio), Katja Seltmann (TTD-TCN, AMNH)
'''Instructors:''' François Michonneau (FLMNH - iDigBio), Tracy Teal (MSU - BEACON), Matt Collins (ACIS - iDigBio), Katja Seltmann (TTD-TCN, AMNH)
Line 48: Line 60:
Tuition for the course is free, but prior registration is required for attending. You can register [https://www.idigbio.org/content/data-carpentry-workshop-idigbio here.]
Tuition for the course is free, but prior registration is required for attending. You can register [https://www.idigbio.org/content/data-carpentry-workshop-idigbio here.]


==Workshop Evaluation==
===Workshop Evaluation===
* link to pre-workshop survey
* link to pre-workshop survey
* link here at end of workshop
* [http://www.idigbio.org/sites/default/files/workshop-presentations/data-carpentry-idigbio/DataCarpentrySeptember2014Evaluation.pdf Post Data Carpentry Workshop Survey Results]


==Software Installation Details==
===Software Installation Details===
'''To Do's, before the workshop.'''<br/><br/>
Software needed for Data Carpentry Workshop at iDigBio
Software needed for Data Carpentry Workshop at iDigBio
*[http://datacarpentry.github.io/2014-09-29-iDigBio/ Data Carpentry software installation instructions]
*[http://datacarpentry.github.io/2014-09-29-iDigBio/ Data Carpentry software installation instructions]
*You must RSVP that the required software is installed, prior to the workshop. Instructors are available to help - see your email for their contact information.


We will be using Adobe Connect extensively in this workshop. Please perform the systems test using the link below. Also, you will also need to install the '''Adobe Connect Add-In''' to participate in the workshop.
We use Adobe Connect extensively in this workshop. Please perform the systems test using the link below. Also, you will also need to install the '''Adobe Connect Add-In''' to participate in the workshop.
*[http://idigbio.adobeconnect.com/common/help/en/support/meeting_test.htm Adobe Connect Systems Test]
*[http://idigbio.adobeconnect.com/common/help/en/support/meeting_test.htm Adobe Connect Systems Test]


==Agenda==
==Agenda==
*[https://idigbio.adobeconnect.com/e4r9cm91cjg/event/login.html?campaign-id=dc2014pt1 AdobeConnect #datacarpentry Room]
*[https://idigbio.adobeconnect.com/e4r9cm91cjg/event/login.html?campaign-id=dc2014pt1 AdobeConnect #datacarpentry Room]
*You must RSVP that the required software is installed, prior to the workshop.
*Pre-workshop meeting and dinner at Piesano's, 6 PM Sunday September 28th. [https://www.google.com/maps/preview?ll=29.652657,-82.338597&z=15&t=m&hl=en-US&gl=US&mapclient=embed&q=1250+W+University+Ave+Gainesville,+FL+32601 Piesano's] is at NW 13th St. and 1250 W. University Ave. in Gainesville. All welcome.
**Instructors are available to help - see your email for their contact information
*Pre-workshop meeting and dinner at Piesano's, 6 PM Sunday September 28th. [https://www.google.com/maps/preview?ll=29.652657,-82.338597&z=15&t=m&hl=en-US&gl=US&mapclient=embed&q=1250+W+University+Ave+Gainesville,+FL+32601 Piesano's] is at NW 13th St. and 1250 W. University Ave. in Gainesville.
**All welcome.


{| class="wikitable" style="width: 55%;"
{| class="wikitable" style="width: 55%;"
!colspan="3"| Course Overview - Day 1
!colspan="3"| Course Overview - Day 1
|-
|-
|8:30-9:00
|8:30-8:45
| Introductions / Overview / Why Data Carpentry? / How to organize data projects
| Introductions & Overview, [http://www.idigbio.org/sites/default/files/workshop-presentations/data-carpentry-idigbio/datacarpentryidigbio2014.pptx Data Carpentry: Making data science more efficient]
| All
| All, Deb Paul
|-
|8:45-9:00
|[https://www.idigbio.org/sites/default/files/workshop-presentations/data-carpentry-idigbio/iDigBioDataCarpentryPSoltis.pptx Linking Heterogeneous Data in Biodiversity Studies: the need for data carpentry]
| Pam Soltis, iDigBio PI
|-
|-
|9:00-10:00
|9:00-10:00
Line 121: Line 134:
|-
|-
| style="background-color: #eee;" |12:00-1:30
| style="background-color: #eee;" |12:00-1:30
| style="background-color: #eee;" | Lunch (Demo)
| style="background-color: #eee;" | Lunch
| style="background-color: #eee;" |  
| style="background-color: #eee;" |  
|-
|-
Line 148: Line 161:


==Link to Workshop Report==
==Link to Workshop Report==
[https://www.idigbio.org/content/data-carpentry-please-can-we-have-some-more Data Carpentry - Please can we have some more?!]


==Logistics==
==Logistics==
*[https://www.idigbio.org/wiki/images/f/ff/DataCarpentry_logistics.pdf Logistics]
*[[Media:DataCarpentry_logistics.pdf |Logistics & Hotel Information (for the out-of-towners)]]
*[https://www.idigbio.org/wiki/images/f/ff/DataCarpentry_logistics.pdf Hotel Information (for the out-of-towners)]
*[https://www.idigbio.org//sites/default/files/workshop-presentations/data-carpentry-idigbio/AreaRestaurants.doc Where to find food]
*[https://www.idigbio.org//sites/default/files/workshop-presentations/data-carpentry-idigbio/AreaRestaurants.doc Where to find food]
*[https://www.idigbio.org/content/data-carpentry-workshop-idigbio Workshop Calendar Announcement]
*[https://www.idigbio.org/content/data-carpentry-workshop-idigbio Workshop Calendar Announcement]
*[[Media:Datacarpentry-registrations-Sep9-web.pdf |Participant List]]


==Adobe Connect Access==
===Adobe Connect Access===
Adobe Connect will be used to provide access for a remote classroom at the American Museum of Natural History. Workshop participants will be encouraged to be logged in to the Adobe Connect room to facilitate sharing with this remote group:
Adobe Connect will be used to provide access for a remote classroom at the American Museum of Natural History. Workshop participants will be encouraged to be logged in to the Adobe Connect room to facilitate sharing with this remote group:
Already registered and accepted?
Already registered and accepted?
Line 162: Line 176:


==Presentation Documents and Links==
==Presentation Documents and Links==
*links to any presentations (like power points) here.
Links to any presentations (like power points) here.
*[http://www.idigbio.org/sites/default/files/workshop-presentations/data-carpentry-idigbio/datacarpentryidigbio2014.pptx Data Carpentry: Making data science more efficient] Presenter: Deb Paul, iDigBio
*[https://www.idigbio.org/sites/default/files/workshop-presentations/data-carpentry-idigbio/iDigBioDataCarpentryPSoltis.pptx Linking Heterogeneous Data in Biodiversity Studies: the need for data carpentry] Presenter: Pam Soltis, iDigBio PI
*[https://github.com/datacarpentry/2014-09-29-iDigBio GitHub repository for Data Carpentry Workshop at iDigBio]
*[https://github.com/datacarpentry/2014-09-29-iDigBio GitHub repository for Data Carpentry Workshop at iDigBio]
*[https://datacarpentry.etherpad.mozilla.org/13 EtherPad for Data Carpentry at iDigBio: workshop notes]
*[https://datacarpentry.etherpad.mozilla.org/13 EtherPad for Data Carpentry at iDigBio: workshop notes]
*[https://github.com/datacarpentry/2014-09-29-iDigBio/blob/master/cheatsheets/sql.pdf?raw=true SQL commands list]
*[https://github.com/datacarpentry/2014-09-29-iDigBio/blob/master/cheatsheets/shell.pdf?raw=true SHELL commands list]
*[http://datacarpentry.github.io/2014-09-29-iDigBio/lessons/R/ R lesson]
*[http://openrefine.org/ Getting started with Open Refine]


==Workshop Recordings==
==Workshop Recordings==
====Day 1====
====Day 1====
*[8:30am-10:00am]
*8:30am-10:00am http://idigbio.adobeconnect.com/p7gqmvndl5a/
*[10:30am-12:00pm]
*10:30am-12:00pm http://idigbio.adobeconnect.com/p6ipl39xnl3/
*[1:00pm-3:00pm]
*12-12:30 http://idigbio.adobeconnect.com/p4dsgo0y77y/
*[3:30-6pm]
*1:00pm-5:00pm http://idigbio.adobeconnect.com/p4rm50ukzv5/


====Day2 ====
====Day2 ====
*[8:30am-10:00am]
*8:30am-10:00am http://idigbio.adobeconnect.com/p2crpl68wr4/
*[10:30am-12:00pm]
*10:30am-12:00pm http://idigbio.adobeconnect.com/p1e150mot75/
*[1:00pm-3:00pm]
*1:30pm-5:00pm http://idigbio.adobeconnect.com/p4t16mcnvvk/
*[3:30-6pm]


==Data Carpentry Resources and Links==
==Data Carpentry Resources and Links==
*[http://nescent.github.io/2014-05-08-datacarpentry/  Inaugural Data Carpentry Workshop] by Tracy K. Teal
*[http://software-carpentry.org/blog/2014/05/our-first-data-carpentry-workshop.html Our First Data Carpentry Workshop] by Karen Cranston
*[https://www.idigbio.org/content/tales-data-carpentry-workshop-demand Tales from the First Data Carpentry Workshop] by Deb Paul
*[https://github.com/datacarpentry/2014-09-29-iDigBio Data Carpentry Materials on GitHub]
*[https://github.com/datacarpentry/2014-09-29-iDigBio Data Carpentry Materials on GitHub]
*[http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003542 Ten Simple Rules for the Care and Feeding of Scientific Data. Goodman et al]
*[http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003542 Ten Simple Rules for the Care and Feeding of Scientific Data. Goodman et al]
*[http://faculty.chicagobooth.edu/matthew.gentzkow/research/CodeAndData.pdf Code and Data for the Social Sciences: A Practitioner's Guide. Matthew Gentzkow, Jesse M. Shapiro Chicago Booth and NBER March 10,2014]
*[http://faculty.chicagobooth.edu/matthew.gentzkow/research/CodeAndData.pdf Code and Data for the Social Sciences: A Practitioner's Guide. Matthew Gentzkow, Jesse M. Shapiro Chicago Booth and NBER March 10,2014]
*[http://dx.doi.org/10.4033/iee.2013.6b.6.f Nine simple ways to make it easier to (re)use your data. White et al.]
*[http://dx.doi.org/10.4033/iee.2013.6b.6.f Nine simple ways to make it easier to (re)use your data. White et al.]
*You want to learn SQL independently? Try [http://www.headfirstlabs.com/books/hfsql/ Head First SQL]
*[http://shop.oreilly.com/product/9780596807702.do Head First Excel, O'Reilly]
*Check out [https://www.dataone.org/ DataONE]
**They've got a great [https://www.dataone.org/software_tools_catalog Software Tools Catalog]
*Put standard metatdata with your data. Wondering how to do that? Check out DataONE's Morpho Tool available under the tools menu at https://knb.ecoinformatics.org/.
**Why? Makes your data re-useable, and better still, makes your data discoverable. Get cited for your datasets in addition to your published papers!
*Using Open Refine? Want to compare your taxon names against a standard list? Try this reconciliation service.
**Read Gaurav ' Blog post first: http://gbif.blogspot.com/2013/07/validating-scientific-names-with.html
**Then, give it a try. The google plus Open Refine community will help you figure it out (it's not hard).
===Links from You===
*How about you? Got a favorite resource - a book?, a website? to share with your classmates?
*[http://datascienceatthecommandline.com/ Data Science at the Command Line]
*[http://www.it.ufl.edu/training/ Free Training Resources for UF students, faculty, and staff] UF provides free access to over 2600 online training courses through Lynda.com.  Does your institution have similar free training opportunities?
===Related Blog Posts and Photos===
*[http://nescent.github.io/2014-05-08-datacarpentry/  Inaugural Data Carpentry Workshop] by Tracy K. Teal
*[http://software-carpentry.org/blog/2014/05/our-first-data-carpentry-workshop.html Our First Data Carpentry Workshop] by Karen Cranston
*[https://www.idigbio.org/content/tales-data-carpentry-workshop-demand Tales from the First Data Carpentry Workshop] by Deb Paul, May 2014
*[https://www.idigbio.org/content/data-carpentry-please-can-we-have-some-more Data Carpentry, Please can we have some more?!] by Deb Paul, 15 Oct 2014
*[https://www.facebook.com/media/set/?set=a.790950270948921.1073741844.215120891865198&type=3 Data Carpentry Facebook Photo Album]
==[[Digitization Training Workshops|Digitization Training Workshops Wiki Home]]==

Latest revision as of 15:17, 18 February 2015

Data Carpentry
Image:ReproducibleWorkflows.png

Quick Links for Data Carpentry
Data Carpentry Agenda
Data Carpentry Biblio Entries
Data Carpentry Report

This wiki supports the Data Carpentry Workshop held simultaneously at the University of Florida at iDigBio and AMNH on September 29-30, 2014. It is the first in a series of four biodiversity informatics workshops planned in the upcoming year (2014-2015) for iDigBio. The next workshop in this series is Data Sharing, Data Standards, and Demystifying the IPT (January 13-14, 2015).

General Information

Data Carpentry's aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain.

Our curriculum includes:

  • Day 1 morning: Better spreadsheet skills and introduction to more powerful tools
  • Day 1 afternoon: Introduction to databases, combining and querying data using SQL
  • Day 2 morning: Introduction to the shell, Introduction to R and managing data in R
  • Day 2 afternoon: Collaborative data management & publishing data

The concepts, skills, and tools we teach are domain-independent, but example problem cases and datasets will be taken from organismal and evolutionary biology, biodiversity science, ecology, and environmental science.

Data Carpentry's teaching is hands-on, so participants are required to bring their own laptops. (We will provide instructions on setting up the required software several days in advance) There are no pre-requisites, and we will assume no prior knowledge about the tools.

Updates will be posted to this website as they become available.

Software Installation Requirements and additional information is available at the github web site for the project:

http://datacarpentry.github.io/2014-09-29-iDigBio/

Planning Team

François Michonneau (FLMNH - iDigBio), Katja Seltmann (TTD-TCN, MNH), Matt Collins (ACIS - iDigBio), Dan Stoner (ACIS - iDigBio), Deborah Paul (FSU - iDigBio), Tracy K. Teal (BEACON), Pam Soltis (FLMNH - iDigBio PI), Derek Masaki (USGS - BISON), Shari Ellis (iDigBio), Kevin Love (iDigBio), Mike Smorul (SESYNC), Juliet Pulliam (UF), Ming Tang (Tommy) (UF), and assistance from Nirav Merchant at iPlant.

About

Instructors: François Michonneau (FLMNH - iDigBio), Tracy Teal (MSU - BEACON), Matt Collins (ACIS - iDigBio), Katja Seltmann (TTD-TCN, AMNH)

Assistants: Dan Stoner (ACIS - iDigBio), Deborah Paul (FSU - iDigBio), Pam Soltis (FLMNH - iDigBio PI), Derek Masaki (USGS), Shari Ellis (iDigBio), Kevin Love (iDigBio), Juliet Pulliam (UF), Tommy Tang (UF), Bernardo Santos (AMNH), Jonathan Foox (AMNH)

Who: The course is aimed at graduate students, postdocs, research staff, and other researchers.

Skill Level: Beginner. Data Carpentry courses are meant for novices.

Where: iDigBio in Gainesville, FL and AMNH (AMNH in New York City via teleconference)

Requirements: Participants must bring a laptop with a few specific software packages installed. If you will be travelling from out of town, you will need to make your own travel arrangements.

Contact: Please email data-carpentry@software-carpentry.org for questions and information not covered here.

Twitter: #datacarpentry

Tuition for the course is free, but prior registration is required for attending. You can register here.

Workshop Evaluation

Software Installation Details

Software needed for Data Carpentry Workshop at iDigBio

We use Adobe Connect extensively in this workshop. Please perform the systems test using the link below. Also, you will also need to install the Adobe Connect Add-In to participate in the workshop.

Agenda

Course Overview - Day 1
8:30-8:45 Introductions & Overview, Data Carpentry: Making data science more efficient All, Deb Paul
8:45-9:00 Linking Heterogeneous Data in Biodiversity Studies: the need for data carpentry Pam Soltis, iDigBio PI
9:00-10:00 Better use of spreadsheets, part I Tracy Teal
10:00-10:30 Break
10:30-12:00 Better use of spreadsheets part II Tracy Teal
12:00-1:30 Lunch (with OpenRefine Demo) Deb Paul
1:30-3:00 SQL Introduction Matt Collins
3:00-3:30 Break
3:30-5:00 SQL part II Matt Collins
5:00-5:30 Review / Wrap up for tomorrow
Course Overview - Day 2
8:30-10:00 Introduction to the shell Tracy Teal
10:00-10:30 Break
10:30-12:00 Introduction to R François Michonneau
12:00-1:30 Lunch
1:30-3:00 Manipulating and plotting data in R François Michonneau
3:00-3:30 Break
3:30-4:30 Getting data in and out of R: How to integrate R in your workflow François Michonneau
4:30-5:00 Sharing your data and your results: RMarkdown and Figshare François Michonneau
5:00-5:30 Review / Wrap up / Evaluation and Feedback

Future plan: Scaling it up: Demo using the iPlant Discovery Environment (DE)

Link to Workshop Report

Data Carpentry - Please can we have some more?!

Logistics

Adobe Connect Access

Adobe Connect will be used to provide access for a remote classroom at the American Museum of Natural History. Workshop participants will be encouraged to be logged in to the Adobe Connect room to facilitate sharing with this remote group: Already registered and accepted?

Presentation Documents and Links

Links to any presentations (like power points) here.

Workshop Recordings

Day 1

Day2

Data Carpentry Resources and Links

Links from You

Related Blog Posts and Photos

Digitization Training Workshops Wiki Home