Participant Related Projects: Difference between revisions

From iDigBio
Jump to navigation Jump to search
No edit summary
 
(32 intermediate revisions by 3 users not shown)
Line 2: Line 2:


== Tolkin ==
== Tolkin ==
Participants Chris Dell and Bryan Heidorn are associated with the [http://www.tolkin.org Tolkin Project]
Hackathon participant [[2013_Hackathon_Participants#Christopher_Dell| Chris Dell]] is associated with the [http://www.tolkin.org Tolkin Project]. Another Tolkin Informatics staff member, Elvis Wu attended the AOCR Working Group meeting in October of 2012 to help us plan the February 2013 hackathon. Reed Beaman, iDigBio Senior Personnel, is one of the Principal Investigators on this project.
From the website:
<blockquote>TOLKIN is an information management and analytical web application to provide informatics support for phylodiversity and biodiversity research projects. As a web-based application, TOLKIN is able to support collaborative projects by providing shared access to a variety of data on voucher specimens, taxonomy, bibliography, morphology, DNA samples and sequences.</blockquote>


== SALIX ==
== SALIX ==
Participant Daryl Lafferty is the developer on this project, Semi-Automated Label Information Extraction System.
AOCR working group member and hackathon participant [[2013_Hackathon_Participants#Daryl_Lafferty| Daryl Lafferty]] is the developer on this project, Semi-Automated Label Information Extraction System (SALIX).
[http://daryllafferty.com/salix/ SALIX software and overview of features and function]
*[http://daryllafferty.com/salix/ SALIX software and overview of features and function]
[http://nhc.asu.edu/vpherbarium/canotia/SALIX3.pdf SALIX, the Semi-automatic Label Information Extraction system]. Online pdf describing the software.
*[http://nhc.asu.edu/vpherbarium/canotia/SALIX3.pdf SALIX, the Semi-automatic Label Information Extraction system]. Online pdf describing the software.
 
== LABELX ==
[[2013_Hackathon_Participants#Byran_Heidorn| Byran Heidorn]], chair of the AOCR working group is the developer of LABELX parser. This software is available on the AOCR VM set up for the hackathon. Contact AOCR working group members, [[2013_Hackathon_Participants#Byran_Heidorn| Byran Heidorn]] and [[2013_Hackathon_Participants#Qianjin_Zhang| Qianjin Zhang]], and remote hackathon participant [[2013_Hackathon_Participants#Steven_Chong| Steven Chong]] for more about LABELX.
 
== Symbiota ==
AOCR working group member and hackathon participant [[2013_Hackathon_Participants#Edward_Gilbert | Edward Gilbert ]]Edward Gilbert is lead developer of [http://symbiota.org/tiki/tiki-index.php Symbiota]. From the introduction on the website:
<blockquote>In this quickly changing world, there has developed a great necessity to learn about our world-wide biota at an increased rate. Scientists are predicting that future species declines will approach historical mass extinction levels within this century. We need to develop better tools to aid taxonomists, field biologists, and environmental educators. It is imperative that we increase our rate of conducting biological inventories, especially within the tropics, as well as steering youth toward becoming our future scientists. Symbiota web tools strive to integrate biological community knowledge and data in order to synthesize a network of databases and tools that will aid in increasing our overall environmental comprehension.</blockquote>


== BiSciCol ==
== BiSciCol ==
Read more about this endeavor at http://biscicol.blogspot.com/
AOCR working group chair and hackaton participant [[2013_Hackathon_Participants#Bryan_Heidorn| Bryan Heidorn]] and iDigBio staff member Reed Beaman are part of the Biological Science Collections Tracker project. Read more about this endeavor at http://biscicol.blogspot.com/
From their website:
<blockquote>BiSciCol (Biological Science Collections) Tracker is a funded NSF collaborative project with the goal of building an infrastructure designed to tag and track scientific collections and all of their derivatives.</blockquote>


== From the Page ==
== FromThePage ==
Hackathon participant, Ben Brumfield's consulting organization is From The Page. Check out teh blog at [http://fromthepage.com FromThePage.com] for more about Ben's experience with developing user interfaces for transcription, and OCR, ...
Hackathon participant, [[2013_Hackathon_Participants#Ben_Brumfield| Ben Brumfield's]] open-source transcription tool is FromThePage. While mainly in use to transcribe and index historical or literary material, it's also been used for herpetology field notes. Check out the shared instance at [http://fromthepage.com FromThePage.com], and read more about the world of crowdsourced transcription/OCR-correction at [http://manuscripttranscription.blogspot.com Ben's blog]
 
From the website:
<blockquote>FromThePage is free software that allows volunteers to transcribe handwritten documents on-line. It's easy to index and annotate subjects within a text using a simple, wiki-like mark-up. Users can discuss difficult writing or obscure words within a page to refine their transcription. The resulting text is hosted on the web, making documents easy to read and search. </blockquote>


== DarwinCore Parser ==
== DarwinCore Parser ==
:::; DarwinCore Parser :  (Node.js) (SilverBiology) very very new and developing this for the OCR meeting and some internal projects. https://github.com/silverbiology/dwc-parser
The DarwinCore Parser comes to us from Hackathon participant, [[2013_Hackathon_Participants#Michael_Giddens| Michael Giddens]], and is a new tool written in Node.js. It represents a very very new tool that SilverBiology is developing for the OCR meeting and some internal projects.
Will be used to send OCR as text and:
See: https://github.com/silverbiology/dwc-parser
*Get Dates from blob of text with ratings
*Will be used to send OCR as text and:
*Get Lat/Lng in various formats
**Get Dates from blob of text with ratings
*Wrapper for GlobalNames and GBIF Checklist Bank for Name Recognition
**Get Lat/Lng in various formats
*Higher Taxa lookup from GBIF Checklist Bank
**Wrapper for GlobalNames and GBIF Checklist Bank for Name Recognition
*Wrapper for Python Lat/Lng format converter
**Higher Taxa lookup from GBIF Checklist Bank
*Type status detection
**Wrapper for Python Lat/Lng format converter
*Experimental: Using geos and GDAM for higher geography lookup and potential any shapefile lookup like Ecological data.
**Type status detection
*more stuff.... looking for help...
**Experimental: Using geos and GDAM for higher geography lookup and potential any shapefile lookup like Ecological data.
**more stuff.... looking for help...


== The Apiary Project ==
== The Apiary Project ==
Find out more about Apiary at http://www.apiaryproject.org
[[2013_Hackathon_Participants#Jason_Best| Jason Best]] is AOCR working group member, hackathon participant, host for the hackathon at BRIT and Co-PI on the Apiary Project. Find out more about Apiary at http://www.apiaryproject.org
From the website:
<blockquote>Our study addresses this research problem: '''What workflow provides for a combination of machine-assisted and human-assisted procedures to most effectively and efficiently convert textual data on specimen labels into machine-processable parsed data to ingest in a database and associate with the digitized specimen? The goal of this project is to answer this question.'''</blockquote>
*The project goal will be accomplished through the following objectives:
**Identify and test machine processes for initial transformation of label data
**Identify human processes that act on the machine-transformed data to correct and enhance label data
**Develop, test, and assess user interfaces to support human processes
**Develop and test a workflow that incorporates both human- and machine-assisted procedures for effectiveness and efficiency in label data transformation and enhancement
**Assess quality of metadata resulting from machine and human processes


== Biodiversity Heritage Library ==
== Biodiversity Heritage Library ==
For an introduction to this project see: http://biodiversitylibrary.org
AOCR working group members and hackathon participants, [[2013_Hackathon_Participants#William_Ulate|William Ulate]] and [[2013_Hackathon_Participants#John_Mignault| John Mignault]], both work on the Biodiversity Heritage Library Project. For an introduction to this project see: http://biodiversitylibrary.org From the website ''about'' page '''Who We Are''' section:
<blockquote>The Biodiversity Heritage Library (BHL) is a consortium of natural history and botanical libraries that cooperate to digitize and make accessible the legacy literature of biodiversity held in their collections and to make that literature available for open access and responsible use as a part of a global “biodiversity commons.” The BHL consortium works with the international taxonomic community, rights holders, and other interested parties to ensure that this biodiversity heritage is made available to a global audience through open access principles. In partnership with the Internet Archive and through local digitization efforts , the BHL has digitized millions of pages of taxonomic literature , representing tens of thousands of titles and over 100,000 volumes.
</blockquote>
 
== Macrofungi Collection Consortium TCN ==
Hackathon participant [[2013_Hackathon_Participants#Scott_Bates | Scott Bates]] manages the Symbiota (see above) data portal ([http://mycoportal.org mycoportal.org]) for the Macrofungi Collection Consortium (MaCC) TCN.
 
From the MaCC project summary:
<blockquote>Scientists in the United States have been studying and collecting macrofungi for the past 150 years, which has produced a legacy of some 1.4 million dried scientific specimens, in 35 institutions in 24 states. These institutions joined forces in an effort to digitize and share online data associated with these specimens. The resulting resource will enable a national census of macrofungi, and will allow researchers to better understand the diversity of these organisms.</blockquote>
Learn more about the Macrofungi Collection Consortium at the [https://sites.google.com/site/macrofungicollectionconsortium/ MaCC project page].
 
== ScioQualis.com / ScioTR ==
Hackathon participants [[2013_Hackathon_Participants#Paul_Schroeder | Paul Schroeder]] and [[2013_Hackathon_Participants#Robin_Schroeder | Robin Schroeder]] are co-founders of [http://scioqualis.com ScioQualis.com].  Scio Qualis is a cloud-based Natural History Collections Management SaaS offering; hosted in the Windows Azure Cloud.  Paul and Robin are also developers of a touch-enabled Windows 8 Metro App, ScioTR, designed for rapid data entry.  ScioTR is a companion product that leverages OCR technology, is designed for crowdsourcing, and integrates with http://ScioQualis.com. 
 
For the iDigBio hackathon, Paul and Robin are exploring a queue-based workflow process using Microsoft Service Bus from the Cloud. Use of the Advanced Message Queuing Protocol (AMQP) would ensure non-Windows applications have easy access (see http://en.wikipedia.org/wiki/Advanced_Message_Queuing_Protocol).  An AMQP hosted service bus can be accessed by developers regardless of whether they are using Java, PHP, Python, Node.js, C, Ruby, Perl, JavaScript or another language.
 
 
----


Back to the [[2013 AOCR Hackathon Wiki]]
Back to the [[2013 AOCR Hackathon Wiki]]

Latest revision as of 13:48, 17 January 2013

Projects of Various Participants with Relevance to the Hackathon

Tolkin

Hackathon participant Chris Dell is associated with the Tolkin Project. Another Tolkin Informatics staff member, Elvis Wu attended the AOCR Working Group meeting in October of 2012 to help us plan the February 2013 hackathon. Reed Beaman, iDigBio Senior Personnel, is one of the Principal Investigators on this project. From the website:

TOLKIN is an information management and analytical web application to provide informatics support for phylodiversity and biodiversity research projects. As a web-based application, TOLKIN is able to support collaborative projects by providing shared access to a variety of data on voucher specimens, taxonomy, bibliography, morphology, DNA samples and sequences.

SALIX

AOCR working group member and hackathon participant Daryl Lafferty is the developer on this project, Semi-Automated Label Information Extraction System (SALIX).

LABELX

Byran Heidorn, chair of the AOCR working group is the developer of LABELX parser. This software is available on the AOCR VM set up for the hackathon. Contact AOCR working group members, Byran Heidorn and Qianjin Zhang, and remote hackathon participant Steven Chong for more about LABELX.

Symbiota

AOCR working group member and hackathon participant Edward Gilbert Edward Gilbert is lead developer of Symbiota. From the introduction on the website:

In this quickly changing world, there has developed a great necessity to learn about our world-wide biota at an increased rate. Scientists are predicting that future species declines will approach historical mass extinction levels within this century. We need to develop better tools to aid taxonomists, field biologists, and environmental educators. It is imperative that we increase our rate of conducting biological inventories, especially within the tropics, as well as steering youth toward becoming our future scientists. Symbiota web tools strive to integrate biological community knowledge and data in order to synthesize a network of databases and tools that will aid in increasing our overall environmental comprehension.

BiSciCol

AOCR working group chair and hackaton participant Bryan Heidorn and iDigBio staff member Reed Beaman are part of the Biological Science Collections Tracker project. Read more about this endeavor at http://biscicol.blogspot.com/ From their website:

BiSciCol (Biological Science Collections) Tracker is a funded NSF collaborative project with the goal of building an infrastructure designed to tag and track scientific collections and all of their derivatives.

FromThePage

Hackathon participant, Ben Brumfield's open-source transcription tool is FromThePage. While mainly in use to transcribe and index historical or literary material, it's also been used for herpetology field notes. Check out the shared instance at FromThePage.com, and read more about the world of crowdsourced transcription/OCR-correction at Ben's blog

From the website:

FromThePage is free software that allows volunteers to transcribe handwritten documents on-line. It's easy to index and annotate subjects within a text using a simple, wiki-like mark-up. Users can discuss difficult writing or obscure words within a page to refine their transcription. The resulting text is hosted on the web, making documents easy to read and search.

DarwinCore Parser

The DarwinCore Parser comes to us from Hackathon participant, Michael Giddens, and is a new tool written in Node.js. It represents a very very new tool that SilverBiology is developing for the OCR meeting and some internal projects. See: https://github.com/silverbiology/dwc-parser

  • Will be used to send OCR as text and:
    • Get Dates from blob of text with ratings
    • Get Lat/Lng in various formats
    • Wrapper for GlobalNames and GBIF Checklist Bank for Name Recognition
    • Higher Taxa lookup from GBIF Checklist Bank
    • Wrapper for Python Lat/Lng format converter
    • Type status detection
    • Experimental: Using geos and GDAM for higher geography lookup and potential any shapefile lookup like Ecological data.
    • more stuff.... looking for help...

The Apiary Project

Jason Best is AOCR working group member, hackathon participant, host for the hackathon at BRIT and Co-PI on the Apiary Project. Find out more about Apiary at http://www.apiaryproject.org From the website:

Our study addresses this research problem: What workflow provides for a combination of machine-assisted and human-assisted procedures to most effectively and efficiently convert textual data on specimen labels into machine-processable parsed data to ingest in a database and associate with the digitized specimen? The goal of this project is to answer this question.

  • The project goal will be accomplished through the following objectives:
    • Identify and test machine processes for initial transformation of label data
    • Identify human processes that act on the machine-transformed data to correct and enhance label data
    • Develop, test, and assess user interfaces to support human processes
    • Develop and test a workflow that incorporates both human- and machine-assisted procedures for effectiveness and efficiency in label data transformation and enhancement
    • Assess quality of metadata resulting from machine and human processes

Biodiversity Heritage Library

AOCR working group members and hackathon participants, William Ulate and John Mignault, both work on the Biodiversity Heritage Library Project. For an introduction to this project see: http://biodiversitylibrary.org From the website about page Who We Are section:

The Biodiversity Heritage Library (BHL) is a consortium of natural history and botanical libraries that cooperate to digitize and make accessible the legacy literature of biodiversity held in their collections and to make that literature available for open access and responsible use as a part of a global “biodiversity commons.” The BHL consortium works with the international taxonomic community, rights holders, and other interested parties to ensure that this biodiversity heritage is made available to a global audience through open access principles. In partnership with the Internet Archive and through local digitization efforts , the BHL has digitized millions of pages of taxonomic literature , representing tens of thousands of titles and over 100,000 volumes.

Macrofungi Collection Consortium TCN

Hackathon participant Scott Bates manages the Symbiota (see above) data portal (mycoportal.org) for the Macrofungi Collection Consortium (MaCC) TCN.

From the MaCC project summary:

Scientists in the United States have been studying and collecting macrofungi for the past 150 years, which has produced a legacy of some 1.4 million dried scientific specimens, in 35 institutions in 24 states. These institutions joined forces in an effort to digitize and share online data associated with these specimens. The resulting resource will enable a national census of macrofungi, and will allow researchers to better understand the diversity of these organisms.

Learn more about the Macrofungi Collection Consortium at the MaCC project page.

ScioQualis.com / ScioTR

Hackathon participants Paul Schroeder and Robin Schroeder are co-founders of ScioQualis.com. Scio Qualis is a cloud-based Natural History Collections Management SaaS offering; hosted in the Windows Azure Cloud. Paul and Robin are also developers of a touch-enabled Windows 8 Metro App, ScioTR, designed for rapid data entry. ScioTR is a companion product that leverages OCR technology, is designed for crowdsourcing, and integrates with http://ScioQualis.com.

For the iDigBio hackathon, Paul and Robin are exploring a queue-based workflow process using Microsoft Service Bus from the Cloud. Use of the Advanced Message Queuing Protocol (AMQP) would ensure non-Windows applications have easy access (see http://en.wikipedia.org/wiki/Advanced_Message_Queuing_Protocol). An AMQP hosted service bus can be accessed by developers regardless of whether they are using Java, PHP, Python, Node.js, C, Ruby, Perl, JavaScript or another language.



Back to the 2013 AOCR Hackathon Wiki