IO-Compress-Brotli
view release on metacpan or search on metacpan
brotli/tests/testdata/lcet10.txt view on Meta::CPAN
FLEISCHHAUER proceeded to describe various search and retrieval
capabilities while ERWAY worked the computer. In this particular package
the "go to" pull-down allowed the user in effect to jump out of Toolbook,
where the interactive program was located, and enter the third-party
software used by AM for this text collection, which is called Personal
Librarian. This was the Windows version of Personal Librarian, a
software application put together by a company in Rockville, Md.
Since the broadsides came from the Revolutionary War period, a search was
conducted using the words British or war, with the default operator reset
as or. FLEISCHHAUER demonstrated both automatic stemming (which finds
other forms of the same root) and a truncated search. One of Personal
Librarian's strongest features, the relevance ranking, was represented by
a chart that indicated how often words being sought appeared in
documents, with the one receiving the most "hits" obtaining the highest
score. The "hit list" that is supplied takes the relevance ranking into
account, making the first hit, in effect, the one the software has
selected as the most relevant example.
While in the text of one of the broadside documents, FLEISCHHAUER
remarked AM's attempt to find ways to connect cataloging to the texts,
which it does in different ways in different manifestations. In the case
shown, the cataloging was pasted on: AM took MARC records that were
written as on-line records right into one of the Library's mainframe
retrieval programs, pulled them out, and handed them off to the contractor,
who massaged them somewhat to display them in the manner shown. One of
AM's questions is, Does the cataloguing normally performed in the mainframe
work in this context, or had AM ought to think through adjustments?
FLEISCHHAUER made the additional point that, as far as the text goes, AM
has gravitated towards SGML (he pointed to the boldface in the upper part
of the screen). Although extremely limited in its ability to translate
or interpret SGML, Personal Librarian will furnish both bold and italics
on screen; a fairly easy thing to do, but it is one of the ways in which
SGML is useful.
Striking a balance between quantity and quality has been a major concern
of AM, with accuracy being one of the places where project staff have
felt that less than 100-percent accuracy was not unacceptable.
FLEISCHHAUER cited the example of the standard of the rekeying industry,
namely 99.95 percent; as one service bureau informed him, to go from
99.95 to 100 percent would double the cost.
FLEISCHHAUER next demonstrated how AM furnishes users recourse to images,
and at the same time recalled LESK's pointed question concerning the
number of people who would look at those images and the number who would
work only with the text. If the implication of LESK's question was
sound, FLEISCHHAUER said, it raised the stakes for text accuracy and
reduced the value of the strategy for images.
Contending that preservation is always a bugaboo, FLEISCHHAUER
demonstrated several images derived from a scan of a preservation
microfilm that AM had made. He awarded a grade of C at best, perhaps a
C minus or a C plus, for how well it worked out. Indeed, the matter of
learning if other people had better ideas about scanning in general, and,
in particular, scanning from microfilm, was one of the factors that drove
AM to attempt to think through the agenda for the Workshop. Skew, for
example, was one of the issues that AM in its ignorance had not reckoned
would prove so difficult.
Further, the handling of images of the sort shown, in a desktop computer
environment, involved a considerable amount of zooming and scrolling.
Ultimately, AM staff feel that perhaps the paper copy that is printed out
might be the most useful one, but they remain uncertain as to how much
on-screen reading users will do.
Returning to the text, FLEISCHHAUER asked viewers to imagine a person who
might be conducting a search in a full-text environment. With this
scenario, he proceeded to illustrate other features of Personal Librarian
that he considered helpful; for example, it provides the ability to
notice words as one reads. Clicking the "include" button on the bottom
of the search window pops the words that have been highlighted into the
search. Thus, a user can refine the search as he or she reads,
re-executing the search and continuing to find things in the quest for
materials. This software not only contains relevance ranking, Boolean
operators, and truncation, it also permits one to perform word algebra,
so to say, where one puts two or three words in parentheses and links
them with one Boolean operator and then a couple of words in another set
of parentheses and asks for things within so many words of others.
Until they became acquainted recently with some of the work being done in
classics, the AM staff had not realized that a large number of the
projects that involve electronic texts were being done by people with a
profound interest in language and linguistics. Their search strategies
and thinking are oriented to those fields, as is shown in particular by
the Perseus example. As amateur historians, the AM staff were thinking
more of searching for concepts and ideas than for particular words.
Obviously, FLEISCHHAUER conceded, searching for concepts and ideas and
searching for words may be two rather closely related things.
While displaying several images, FLEISCHHAUER observed that the Macintosh
prototype built by AM contains a greater diversity of formats. Echoing a
previous speaker, he said that it was easier to stitch things together in
the Macintosh, though it tended to be a little more anemic in search and
retrieval. AM, therefore, increasingly has been investigating
sophisticated retrieval engines in the IBM format.
FLEISCHHAUER demonstrated several additional examples of the prototype
interfaces: One was AM's metaphor for the network future, in which a
kind of reading-room graphic suggests how one would be able to go around
to different materials. AM contains a large number of photographs in
analog video form worked up from a videodisc, which enable users to make
copies to print or incorporate in digital documents. A frame-grabber is
built into the system, making it possible to bring an image into a window
and digitize or print it out.
FLEISCHHAUER next demonstrated sound recording, which included texts.
Recycled from a previous project, the collection included sixty 78-rpm
phonograph records of political speeches that were made during and
immediately after World War I. These constituted approximately three
hours of audio, as AM has digitized it, which occupy 150 megabytes on a
CD. Thus, they are considerably compressed. From the catalogue card,
FLEISCHHAUER proceeded to a transcript of a speech with the audio
available and with highlighted text following it as it played.
A photograph has been added and a transcription made.
Considerable value has been added beyond what the Library of Congress
normally would do in cataloguing a sound recording, which raises several
questions for AM concerning where to draw lines about how much value it can
afford to add and at what point, perhaps, this becomes more than AM could
reasonably do or reasonably wish to do. FLEISCHHAUER also demonstrated
( run in 2.507 seconds using v1.01-cache-2.11-cpan-5b529ec07f3 )