MUSEUM-L Archives

Museum discussion list

MUSEUM-L@HOME.EASE.LSOFT.COM

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Kaldenbach <[log in to unmask]>
Reply To:
Museum discussion list <[log in to unmask]>
Date:
Thu, 17 Oct 2002 14:24:41 +0200
Content-Type:
text/plain
Parts/Attachments:
text/plain (95 lines)
Dear List,

OCR scanning is a tricky business indeed when
a) pages have more than one column
b) the typeface is small
c) the typeface is 19 th century or otherwise unusual.

At my page on Delft Artists and Patrons you will see the Obreen section which is the result of a lengthy OCR process. Go to the very end of part 2 and you will see what a terrible mess un-edited
OCR output can be.  I had to manually edit about 3 to 10 errors per line in 100 out of 120 pages. See
www.xs4all.nl/~kalden

Alternatively I would recommend going to a text type company in India or another low cost country.
There are methods for typing large amounts of text for a low price.
The text is typed character by character by 3 different typists on 3 different computers and then through software the text is compared line by line, page by page to weed out errors. Even after that lengthy process prices are low.

The largest singe language dictionary set in the world, which happens to be the Dutch WNT (Woordenboek der Nederlandse Taal)  was published from the nineteenth century onwards and the process described above was used to make a digital CD-rom version of this magnificent dictionary.

Scanning an an image and presenting the result in jpg picture is an option but for the necessary crispness one needs an awful lot of disk space and in the end character or word recognition is not possible.

I am sure Google will lead you to a right place in India.

I hope this can be of service.

Yours,

Kees Kaldenbach
www.johannesvermeer.org




Patti Davis-Perkins wrote:

> I would be greatly interested in any information one could provide on this
> topic as we are currently debating whether to OCR similar documents OR
> simply scan as JPEGS and store as an image in our database.
>
> Patricia Davis-Perkins
> Coordonnatrice - Numérisation des collections/Coordinator - Collections
> Digitization
> Bibliothèque, archives et services de documentation/Library, Archives and
> Documentation services
> Musée canadien des civilisations/Canadian Museum of Civilization
> Tél : (819) 776-8456
> mailto:[log in to unmask]
>
> -----Original Message-----
> From: Greenberg, Ted [mailto:[log in to unmask]]
> Sent: October 16, 2002 8:18 PM
> To: [log in to unmask]
> Subject: Scanning
>
> I would like to know if any RC members have experience with OCR scanning of
> printed documents.  We are thinking of scanning the text from our collection
> catalogues as a simple way of entering huge amounts of data into our CMS
> system (Multi-MIMSY).  Data would include bibliographies, exhibition
> histories, descriptions, provenance and essays for each artwork in a
> particular collection, eg, Indian painting collection, etc.
>
> Specifically, I would like to know if you outsourced the scanning?.  If so,
> what was the average cost per page, or pricing structure?  Who did you use?
> Or, did you purchase scanning equipment and perform the scanning internally?
> What sort of unexpected costs or delays were encountered, if any?
>
> Please respond offline to Renee Montgomery, Asst. Director, Collections
> Management, Los Angeles County Museum of Art
> email:  [log in to unmask]; phone:  323 857-6059
>
> =========================================================
> Important Subscriber Information:
>
> The Museum-L FAQ file is located at
> http://www.finalchapter.com/museum-l-faq/ . You may obtain detailed
> information about the listserv commands by sending a one line e-mail message
> to [log in to unmask] . The body of the message should read "help"
> (without the quotes).
>
> If you decide to leave Museum-L, please send a one line e-mail message to
> [log in to unmask] . The body of the message should read "Signoff
> Museum-L" (without the quotes).
>
> =========================================================
> Important Subscriber Information:
>
> The Museum-L FAQ file is located at http://www.finalchapter.com/museum-l-faq/ . You may obtain detailed information about the listserv commands by sending a one line e-mail message to [log in to unmask] . The body of the message should read "help" (without the quotes).
>
> If you decide to leave Museum-L, please send a one line e-mail message to [log in to unmask] . The body of the message should read "Signoff Museum-L" (without the quotes).

=========================================================
Important Subscriber Information:

The Museum-L FAQ file is located at http://www.finalchapter.com/museum-l-faq/ . You may obtain detailed information about the listserv commands by sending a one line e-mail message to [log in to unmask] . The body of the message should read "help" (without the quotes).

If you decide to leave Museum-L, please send a one line e-mail message to [log in to unmask] . The body of the message should read "Signoff Museum-L" (without the quotes).

ATOM RSS1 RSS2