LISTSERV - MUSEUM-L Archives - HOME.EASE.LSOFT.COM

MUSEUM-L Archives

Museum discussion list

MUSEUM-L@HOME.EASE.LSOFT.COM

LISTSERV Archives

MUSEUM-L Home

Subscribe or Unsubscribe

Search Archives

Options:

Use Forum View

Use Monospaced Font
Show Text Part by Default
Condense Mail Headers

Message:

[<< First] [< Prev] [Next >] [Last >>]

Topic:

[<< First] [< Prev] [Next >] [Last >>]

Author:

[<< First] [< Prev] [Next >] [Last >>]

Sender:

Museum discussion list <[log in to unmask]>

Date:

Wed, 22 Aug 2012 16:16:41 -0400

Reply-To:

Museum discussion list <[log in to unmask]>

Message-ID:

<[log in to unmask]>

Subject:

iDigBio Augmenting OCR October Workshop, February Hackathon Invitation

MIME-Version:

1.0

Content-Transfer-Encoding:

7bit

Content-Type:

text/plain; charset="ISO-8859-1"; format=flowed

Organization:

iDigInfo

From:

Deb Paul <[log in to unmask]>

Parts/Attachments:

text/plain (64 lines)

Invitation to Augmented OCR Best Practices Workshop and Hack-a-thon 
Planning

iDigBio (https://www.idigbio.org/) is running a workshop (October 1-2, 
2012) and hack-a-thon (February 2013) to identify best practices and 
develop tools to get information from museum labels into computers.

We are seeking individuals to participate in the "iDigBio Augmenting 
OCR" workshop on October 1-2. The objective of the workshop is to 
improve OCR output and subsequent manipulation by algorithms to extract 
the content of biological collection specimen labels and notes and have 
them efficiently and accurately inserted into a database for future use. 
Participants in the October workshop plan to narrow the hack-a-thon 
focus down to specific programmatic goals for software developers 
working at a hackathon to be held in February of 2013.

Most broadly there can be four main steps to digitization: create an 
image, process the image to text using Optical Character Recognition 
(OCR) and/or human typists, break the content of the text into 
semantically useful fields such as family, scientific name, collector, 
date collected, location, habitat, growth habit and other fields and 
finally format this information for injection into a database. The 
participants will help to identify and collect images that are 
representative of those that will be needed by the biology community. 
This collection of images will serve as the working set for developers 
in the February Hack-a-thon.

The October workshop participants plan to identify OCR output products 
that will be useful for the community as well as metrics that help 
evaluate how well different automation approaches produce these 
products. This may include measures of accuracy of the OCR but also 
accuracy of automated error correction, effectiveness of breaking text 
into meaningful semantic units such as precision, recall and F-Score. We 
seek biologists, programmers and others involved in the digitization 
process to participate in this October workshop to plan the February 
hack-a-thon and participate in the hackathon itself.

Anyone can view our wish list at http://tinyurl.com/OCRHackathonWishList 
of some possible goals we have for optimizing machine and natural 
language processing algorithms used on OCR output from specimen labels. 
If interested in participating and you would like to know more please 
email asap to: Debbie Paul, [log in to unmask] Deadline Thursday, August 30th 
to participate in the Oct 1 - 2 workshop.

Looking forward to your participation, From all of us in the iDigBio 
Augmenting OCR Working Group
Please forward to other interested listserves - thanks.
(Note it's already been posted to TAXACOM, NHCOLL and TDWG).

-- 
Deborah Paul
User Services, iDigBio
Institute for Digital Information, iDigInfo
Florida State University
Tallahassee, Florida 32308
850-644-6366

=========================================================
Important Subscriber Information:

The Museum-L FAQ file is located at http://www.finalchapter.com/museum-l-faq/ . You may obtain detailed information about the listserv commands by sending a one line e-mail message to [log in to unmask] . The body of the message should read "help" (without the quotes).

If you decide to leave Museum-L, please send a one line e-mail message to [log in to unmask] . The body of the message should read "Signoff Museum-L" (without the quotes).

ATOM RSS1 RSS2

HOME.EASE.LSOFT.COM