Invitation to Augmented OCR Best Practices Workshop and Hack-a-thon
Planning
iDigBio (https://www.idigbio.org/) is running a workshop (October 1-2,
2012) and hack-a-thon (February 2013) to identify best practices and
develop tools to get information from museum labels into computers.
We are seeking individuals to participate in the "iDigBio Augmenting
OCR" workshop on October 1-2. The objective of the workshop is to
improve OCR output and subsequent manipulation by algorithms to extract
the content of biological collection specimen labels and notes and have
them efficiently and accurately inserted into a database for future use.
Participants in the October workshop plan to narrow the hack-a-thon
focus down to specific programmatic goals for software developers
working at a hackathon to be held in February of 2013.
Most broadly there can be four main steps to digitization: create an
image, process the image to text using Optical Character Recognition
(OCR) and/or human typists, break the content of the text into
semantically useful fields such as family, scientific name, collector,
date collected, location, habitat, growth habit and other fields and
finally format this information for injection into a database. The
participants will help to identify and collect images that are
representative of those that will be needed by the biology community.
This collection of images will serve as the working set for developers
in the February Hack-a-thon.
The October workshop participants plan to identify OCR output products
that will be useful for the community as well as metrics that help
evaluate how well different automation approaches produce these
products. This may include measures of accuracy of the OCR but also
accuracy of automated error correction, effectiveness of breaking text
into meaningful semantic units such as precision, recall and F-Score. We
seek biologists, programmers and others involved in the digitization
process to participate in this October workshop to plan the February
hack-a-thon and participate in the hackathon itself.
Anyone can view our wish list at http://tinyurl.com/OCRHackathonWishList
of some possible goals we have for optimizing machine and natural
language processing algorithms used on OCR output from specimen labels.
If interested in participating and you would like to know more please
email asap to: Debbie Paul, [log in to unmask] Deadline Thursday, August 30th
to participate in the Oct 1 - 2 workshop.
Looking forward to your participation, From all of us in the iDigBio
Augmenting OCR Working Group
Please forward to other interested listserves - thanks.
(Note it's already been posted to TAXACOM, NHCOLL and TDWG).
--
Deborah Paul
User Services, iDigBio
Institute for Digital Information, iDigInfo
Florida State University
Tallahassee, Florida 32308
850-644-6366
=========================================================
Important Subscriber Information:
The Museum-L FAQ file is located at http://www.finalchapter.com/museum-l-faq/ . You may obtain detailed information about the listserv commands by sending a one line e-mail message to [log in to unmask] . The body of the message should read "help" (without the quotes).
If you decide to leave Museum-L, please send a one line e-mail message to [log in to unmask] . The body of the message should read "Signoff Museum-L" (without the quotes).
|