MUSEUM-L Archives

Museum discussion list

MUSEUM-L@HOME.EASE.LSOFT.COM

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Douglas W. St.Clair" <[log in to unmask]>
Reply To:
Museum discussion list <[log in to unmask]>
Date:
Thu, 2 Jul 1998 20:06:18 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (33 lines)
>Scanners may be used to get the nuances of handwriting, typewriter fonts, etc.
>and stored in databases as a file.

This comment reminded me of some experience I had converting scanned
documents to computer editable ones (OCR technology). First the 95%
accuracy, and similar numbers, quoted by the sales literature, is best the
conversion the equipment is capable of doing. This assumes the best clean
paper, carbon ribbon (not plain old inked fabric), electric typewriter, and
so on. If you assume 95% accuracy then 5 characters out of every 100 are
bad. If you are doing text you probably can find 90% of the errors with a
spelling checker. But finding the balance can be very time consuming. If
you are scanning numbers, or material that can not be spell checked, than
you must carefully proof every item to find those errors.

There was a company that scanned and converted legal cases and put them on
line. They had an interesting strategy. The did not correct the material.
However, they did produce a very sophisticated search engine that would
find stuff even with lots of errors and the person reading could generally
sort out the meaning.


END
end
**************************************************
Douglas W. St.Clair
Tir Na Nog
400 Burton Highway
Wilton. NH 03086-5022
PH: 603-654-9321
FAX: 603-654-5440
EMAIL: [log in to unmask]
**************************************************

ATOM RSS1 RSS2