Difference between revisions of "IDigBio Augmenting OCR Workshop"

From iDigBio
Jump to: navigation, search
m (Hackathon Issues)
 
(47 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
{| class="wikitable" style="float:right;"
 +
! colspan="2" style="background:#D58B28;width:200px;font-size:10pt" | Digitizing the Past and Present for the Future
 +
|-
 +
| colspan="2" style="text-align:center;font-size:7pt" |<!--YOU CAN INSERT A NEW IMAGE FOR THE LOGO BETWEEN THE COLON AND THE PIPE--> [[Image:IDigBio Logo RGB.png|center|300px|iDigBio Logo RGB.png]]<br />
 +
|-
 +
!colspan="2" style="background:#D58B28;text-align:center;font-size:9pt" | Quick Links for Augmenting OCR Workshop
 +
|-
 +
|[http://tinyurl.com/AOCRTwoDayAgenda Augmenting OCR Workshop Agenda]
 +
|-
 +
|[https://www.idigbio.org/biblio?f%5bkeyword%5d=61 Augmenting OCR Workshop Biblio Entries]
 +
|-
 +
| Augmenting OCR Workshop Report
 +
|}
 
== Overview Augmenting OCR Workshop ==
 
== Overview Augmenting OCR Workshop ==
<br>
+
 
 +
*Note this page is undergoing frequent updates as the workshop approaches.
 
The Augmenting OCR Working Group, the iDigBio IT staff and invited guests meet October 1 - 2, 2012 in Gainesville, Florida for a 2 day intensive workshop to plan a hackathon and concurrent workshop, put together iDigBio Wiki content from collective knowledge about OCR in digitization workflows, and learn about the latest developments in OCR and NLP from all invited participants.
 
The Augmenting OCR Working Group, the iDigBio IT staff and invited guests meet October 1 - 2, 2012 in Gainesville, Florida for a 2 day intensive workshop to plan a hackathon and concurrent workshop, put together iDigBio Wiki content from collective knowledge about OCR in digitization workflows, and learn about the latest developments in OCR and NLP from all invited participants.
  
Karl-Heinz Steinke from Hannover, Germany and the [http://www.hs-hannover.de/forschung/forschungsschwerpunkte/herbar-digital/index.html Herbar Digital] project is our key speaker. Karl-Heinz' group has been working for the last 5 years on improving OCR algorithms for recognizing handwriting and on OCR algorithms in general as part of making digitization of herbarium specimens more efficient. See [http://www.tdwg.org/proceedings/article/view/298 Feature recognition for herbarium specimens (Herbar-Digital)] to learn more about this project's work.  
+
Karl-Heinz Steinke from Hannover, Germany and the [http://www.hs-hannover.de/forschung/forschungsschwerpunkte/herbar-digital/index.html Herbar Digital] project is our key speaker. Karl-Heinz' group has been working for the last 5 years on improving OCR algorithms for recognizing handwriting and on OCR algorithms in general as part of making digitization of herbarium specimens more efficient. See [http://www.tdwg.org/proceedings/article/view/298 Feature recognition for herbarium specimens (Herbar-Digital)] and [http://www.fakultaet2.fh-hannover.de/organisation/labore/fertigungsautomatisierung-digitale-fabrik/forschung-entwicklung/herbar-digital/herbar%20digital%202012 Herbar Digital 2012] for the latest about this project's work.  
  
 
A Hackathon for February 2013 is on our list. We're set up to head for the Botanical Research Institute of Texas (BRIT) in February of 2013 to make strides in just what OCR, ML and NLP can do to make our digitization efforts more efficient in producing data faster and producing data that's fit-for-use. We'll be choosing our hackathon focus and designing the hackathon together with the iDigBio IT Staff at the upcoming October workshop.
 
A Hackathon for February 2013 is on our list. We're set up to head for the Botanical Research Institute of Texas (BRIT) in February of 2013 to make strides in just what OCR, ML and NLP can do to make our digitization efforts more efficient in producing data faster and producing data that's fit-for-use. We'll be choosing our hackathon focus and designing the hackathon together with the iDigBio IT Staff at the upcoming October workshop.
Line 10: Line 24:
  
 
== Logistics ==
 
== Logistics ==
*[https://www.idigbio.org/wiki/index.php/Augmenting_OCR_Logistics Find details here for lodging, meals, maps], reimbursement information, hotel shuttle details and more.
+
*[[Augmenting OCR Logistics|Find details here for lodging, meals, maps]], reimbursement information, hotel shuttle details and more.
  
 
== Participants ==
 
== Participants ==
*See a [http://tinyurl.com/AugmentingOCRAttendeeList list of meeting participants] including working group members, iDigBio staff, invited guests and remote participants.
+
*See a [http://tinyurl.com/AugmentingOCRAttendeeList|list of meeting participants] including working group members, iDigBio staff, invited guests and remote participants.
 +
*[https://docs.google.com/document/d/1YUmzSX1Bi-Hdsgl3cLix7xLQIaOwz70ftHB5H_bHQb0/edit Workshop Attendee List]
  
 
== Attend Remotely ==
 
== Attend Remotely ==
*Click [http://idigbio.adobeconnect.com/augmentocr/ Augment OCR] to attend remotely via Adobe Connect.
+
*Click http://idigbio.adobeconnect.com/augmentocr/Augment_OCR to attend remotely via Adobe Connect.
 
*Please sign in 15 - 20 minutes before the session of interest to learn how Adobe Connect works.
 
*Please sign in 15 - 20 minutes before the session of interest to learn how Adobe Connect works.
  
Line 24: Line 39:
 
*[http://tinyurl.com/AOCRTwoDayAgenda Agenda by Day]
 
*[http://tinyurl.com/AOCRTwoDayAgenda Agenda by Day]
 
*[http://tinyurl.com/OCRHackathonWishList Hackathon Topic List - "our wish list"]
 
*[http://tinyurl.com/OCRHackathonWishList Hackathon Topic List - "our wish list"]
Priority Issues Outlined
+
*[http://tinyurl.com/AOCRpriorities A-OCR Priority Issues - Choosing Metrics to define success]
  
== Taking Notes ==
+
== Workshop Presentations ==
*[http://tinyurl.com/AugmentingOCRgroupNotes Take Meeting Notes Here]
+
 
 +
*[https://www.idigbio.org/sites/default/files/workshop-presentations/aocr-wgw/Steinke_florida2.ppt Karl-Heinz Steinke, Hochschule Hannover: Image analysis of herbarium specimens (Herbar-Digital)]
 +
**[https://vimeo.com/51934233 Karl-Heinz Steinke, Hochschule Hannover: Image analysis of herbarium specimens (Herbar-Digital) - Video]
 +
*[https://www.idigbio.org/sites/default/files/workshop-presentations/aocr-wgw/Haston_OCR%20within%20the%20digitisation%20workflow%20at%20RBGE.pptx#overlay-context=home Elspeth Haston, Ron Cubey, Robin Drinkwater, Royal Botanic Garden Edinburgh: OCR within the digitisation workflow at RBGE]
 +
*Jason Best, BRIT: Demo of The Apiary Project, Walk-through and Challenges
 +
*[https://www.idigbio.org/sites/default/files/workshop-presentations/aocr-wgw/Gilbert_OcrWorkshop.pptx Edward Gilbert, Corinna Gries, Thomas H. Nash III, Robert Anglin, LBCC TCN: Lichens, Bryophytes and Climate Change]
 +
*[https://www.idigbio.org/sites/default/files/workshop-presentations/aocr-wgw/Watson-Tri-Trophic-Digitization-OCR.pptx Kimberly Watson, NYBG: Tri-Trophic Digitization: Putting the OCR in Workflow]
 +
*[https://www.idigbio.org/sites/default/files/workshop-presentations/aocr-wgw/gottschalk_gainesville.pptx Stephen Gottschalk, NYBG: OCR implementation in The Caribbean Plants Digitization Project, a project to image and catalog over 150,000 Caribbean specimens at the New York Botanical Garden]
 +
*[https://www.idigbio.org/sites/default/files/workshop-presentations/aocr-wgw/Lafferty_Gainesville.ppt Daryl Lafferty, Arizona State University: OCR and SALIX Parsing]
 +
*Qianjin Zhang, Bryan Heidorn, University of Arizona: HERBIS/LABELX Demonstration - Parsing OCR Output
 +
*[https://www.idigbio.org/sites/default/files/workshop-presentations/aocr-wgw/Oboyski-CalBug_OCR.pptx Peter Oboyski, Phuc Nguyen, Serge Belongie, Rosemary Gillespie, Essig Museum: Digitization California Arthropod Collection - CalBug]
 +
 
 +
== Workshop Bibliography  ==
 +
*[https://www.idigbio.org/biblio?f%5bkeyword%5d=61&s=type&o=asc Jump to Augmenting OCR Workshop Bibliography]
 +
 
 +
== Taking Notes ==
 +
 
 +
*[http://tinyurl.com/AugmentingOCRgroupNotes Workshop Meeting Notes]
  
 
== Hackathon Issues ==
 
== Hackathon Issues ==
 
Some reading materials that may prove useful.
 
Some reading materials that may prove useful.
*[http://www.slideshare.net/kcranstn/phylotastic-ievobio Hints from Phylotastic Hackation - see Slides 6 - 14] Phylotastic hackathon report from 2012 iEvoBio meeting
+
*[http://www.slideshare.net/kcranstn/phylotastic-ievobio Hints from Phylotastic Hackathon - see Slides 6 - 14] Phylotastic hackathon report from 2012 iEvoBio meeting
 
+
 
*[http://informatics.nescent.org/wiki/Hackathon_Whitepaper_Guidelines Nescent Hackathon Whitepaper Guidelines]
 
*[http://informatics.nescent.org/wiki/Hackathon_Whitepaper_Guidelines Nescent Hackathon Whitepaper Guidelines]
  
== ==
+
== Workshop Outcomes ==
 +
Our hackathon document - a work in progress from the workshop.
 +
*[http://tinyurl.com/aocrHack Call for Participation]
 +
Apply at: http://tinyurl.com/iDigBioAOCRHackathon

Latest revision as of 17:23, 3 February 2015

Digitizing the Past and Present for the Future
iDigBio Logo RGB.png

Quick Links for Augmenting OCR Workshop
Augmenting OCR Workshop Agenda
Augmenting OCR Workshop Biblio Entries
Augmenting OCR Workshop Report

Overview Augmenting OCR Workshop

  • Note this page is undergoing frequent updates as the workshop approaches.

The Augmenting OCR Working Group, the iDigBio IT staff and invited guests meet October 1 - 2, 2012 in Gainesville, Florida for a 2 day intensive workshop to plan a hackathon and concurrent workshop, put together iDigBio Wiki content from collective knowledge about OCR in digitization workflows, and learn about the latest developments in OCR and NLP from all invited participants.

Karl-Heinz Steinke from Hannover, Germany and the Herbar Digital project is our key speaker. Karl-Heinz' group has been working for the last 5 years on improving OCR algorithms for recognizing handwriting and on OCR algorithms in general as part of making digitization of herbarium specimens more efficient. See Feature recognition for herbarium specimens (Herbar-Digital) and Herbar Digital 2012 for the latest about this project's work.

A Hackathon for February 2013 is on our list. We're set up to head for the Botanical Research Institute of Texas (BRIT) in February of 2013 to make strides in just what OCR, ML and NLP can do to make our digitization efforts more efficient in producing data faster and producing data that's fit-for-use. We'll be choosing our hackathon focus and designing the hackathon together with the iDigBio IT Staff at the upcoming October workshop.

As part of our working group's outreach efforts, we've set up participation in the upcoming iSchools Conference in Fort Worth, Texas in February 2013 where our working group is participating in three ways. We're submitting a poster, a notes paper, and hosting a half-day workshop to showcase our work and seek out potential collaborators. The iSchools 2013 theme is Data-Innovation-Wisdom which lines up perfectly with the goals of the ADBC, iDigBio and the TCNs. This conference is concurrent with our hackathon at BRIT.

Logistics

Participants

Attend Remotely

Workshop Materials

Workshop Presentations

Workshop Bibliography

Taking Notes

Hackathon Issues

Some reading materials that may prove useful.

Workshop Outcomes

Our hackathon document - a work in progress from the workshop.

Apply at: http://tinyurl.com/iDigBioAOCRHackathon