Case study: a book-oriented digital library
Case study: a book-oriented digital library
Case study: a book-oriented digital library
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Elizabeth Lew<br />
LIS 7410 / Spring 2011<br />
Dr. Wu<br />
elew1@tigers.lsu.edu<br />
Michigan State University Library’s Feeding America:<br />
A <strong>Case</strong> Study<br />
Feeding America is a <strong>library</strong> of 76 cook<strong>book</strong>s digitized from the Michigan State<br />
University Library‟s print collection. These 76 were carefully chosen from 7,000 <strong>book</strong>s<br />
related to culinary arts in the <strong>library</strong>‟s holdings, based on their historical significance<br />
(Longborne). Browsing the collection by publication date reveals that the oldest<br />
cook<strong>book</strong> in the collection dates from 1798(Amelia Simmons‟s American Cookery, and<br />
the newest from 1922 (Bertha M. Wood‟s Foods of the Foreign-Born in Relation To<br />
Health).<br />
Content Creation<br />
Once chosen, the 76 cook<strong>book</strong>s were digitized and transcribed. Those in good condition<br />
were scanned on a color flatbed scanner at 400 dpi, in 24-bit color, and compressed as<br />
.TIF files. Those in brittle condition were scanned by an overhead scanner and saved as<br />
600 dpi in 1-bit color .TIF files, though the FA‟s site does not specify whether these were<br />
compressed (Digitization). The .TIF masters were then converted to .JPG images of each<br />
scan, and PDF versions of the complete texts. Users can download these versions<br />
directly from the FA site. However, if users wish to acquire copies of the .TIF masters,<br />
they are available upon request.<br />
Typists then transcribed the texts from PDF using Note Tab software. According to the<br />
FA website, project overseers chose to use Note Tab because “it produces only ASCII<br />
text and has tag libraries which can be customized by editing one of the .clb files<br />
accompanying the program. A miniature tag <strong>library</strong> has been created for the cook<strong>book</strong>s,<br />
containing only page breaks, paragraphs, font styles, and an ISO-LAT1 character menu”<br />
(Digitization). The typists produced two copies of each text, which proofread against<br />
each other by using software to compare the two transcriptions.<br />
Metadata<br />
Once transcribed, the text of each cook<strong>book</strong> was then encoded using XML and a DTD<br />
formulated especially for the Feeding America project.<br />
The entire transcription file for a single cook<strong>book</strong> (including the element) is<br />
enclosed in the the wrapper element assigned to each individual work.
Lew - 2<br />
Attributes for this element include: type (example values are: famous, restaurant, charity),<br />
chefschool, histperiod, region, ethnicgroup, occasion (example values include: weddings,<br />
Thanksgiving, winter, etc.). (Encoding)<br />
Project overseers decided to use Dublin Core standards to organize resource description<br />
for each work. This description, based on Dublin Core’s 11 elements, is located at the<br />
beginning of the file, before the cook<strong>book</strong>’s body text, and marked by the tag.<br />
(Encoding) Dublin Core elements include: contributor, coverage (“The spatial or<br />
temporal topic of the resource”), creator, date, description, format, identifier, language,<br />
publisher, relation, rights, source, subject, title, and type. (Dublin Core)<br />
Other metadata is located throughout the cook<strong>book</strong>‟s body text, marked by the <br />
element. The Feeding America DTD also allows for , , and<br />
elements under the element. Beneath these elements, <br />
and elements can be used for edibles and non-edibles, respectively. Still<br />
further, the FA DTD also provides for sub-elements such as (to mark<br />
uncommon processes mentioned in the text such as “braise” or “poach”; “stir” and<br />
“roast” are too common to note), , and (these would mark<br />
textual mention of uncommon tools such as “ramekin” or “double boiler”, but not<br />
“spoon” or “dish”). (Encoding) Elements that describe the herarchy of sections of each<br />
cook<strong>book</strong> from the largest to smallest can be described by “class1”<br />
and “class2” attributes, which use defined values (like a very small controlled vocabulary<br />
thesaurus) to describe types of food. Examples of possible values include:<br />
“eggcheesedairy”, “soups”, “fruitvegbeans”, and “meatfishgame”. (DTD)<br />
Interface and Pages<br />
The Feeding America portal displays links to the interactive parts of the website in a<br />
horizontal tool bar laid out along the top of the page. Below it, news and updates, along<br />
with sample images from the collection, fill the main portion, and right side of the page,<br />
respectively.<br />
The top tool bar is comprised of 8 links: Home, The Project, The Authors, Browse the<br />
Collection, Search the Collection, Museum Objects, Glossary, and Partners.
Lew - 3<br />
The Project provides a description of the Feeding America project, where much of the<br />
background information in this paper was gathered. This area describes the scope and<br />
nature of the collection, as well as the process of digitizing, coding, and building the<br />
Feeding America portal. This section also provides press links and a 15-minute video<br />
describing the collection and why some of the 76 <strong>book</strong>s were chosen to be digitized.<br />
Browsing<br />
The Authors, Browse, Museum Objects, and Glossary pages are all presented similarly:<br />
as menus, with a brief description of the contents of the list at the top of the page, and<br />
directly beneath, the alphabet laid out as a menu, with each letter linking to the section of<br />
the list body that corresponds with that letter. Below the alphabet menu is the beginning<br />
of the alphabetized list, which can be entirely scrolled through as well.<br />
The Glossary is a list of culinary terms, and does not provide opportunity for much<br />
further interaction. The definition of each term is given directly after the alphabetized<br />
entry, and only a few entries link to a different page.<br />
Each entry listed in the Browse, Authors, and Museum Objects Pages all provide links to<br />
unique pages for each entry. The author menu allows users to browse an alphabetical list<br />
of cook<strong>book</strong> authors. Each entry is also a link to a unique page created for that author,<br />
containing a reference to the cook<strong>book</strong> she/he authored, as well as a brief biography. The<br />
Objects page allows users to browse an alphabetical list of cooking instruments housed in<br />
the Michigan State University Museum. Likewise, each entry is also a link to a unique<br />
page created for each instrument. Object pages include a photo of the utensil as well as a
description of the object‟s history and purpose.<br />
Author browse menu<br />
Object browse menu<br />
Lew - 4<br />
The Browse page is devoted to the cook<strong>book</strong>s. Initially, the list of entries (each<br />
formatted as cook<strong>book</strong> title, author, and date of publication) is arranged alphabetically.<br />
However, there are links at the top of the page to view versions of the list arranged by<br />
date, and by interest. Arranged by date, the list is broken up into groups of three-year<br />
intervals. Arranged by interest, the list is broken up into groups such as “Diet, nutrition,<br />
health, health movements, vegetarianism; Charity and Church cook<strong>book</strong>s; Hotels,<br />
restaurants, etc; Markets and produce; Military cooking; Regional American Cooking:<br />
Midwest, Northeast, South and Border states, and West; Ethnic Influences: African,<br />
Asian, Creole, English, French, and so on. Each of these interest pages in turn pulls up a<br />
short list of related titles. These lists are hand-coded, and not sets of retrieved results.
Searching<br />
Lew - 5<br />
The Search Page offers 4 options for searching: Book Author, Book Title, Recipe Name,<br />
and Ingredients. The search function does not provide an option to browse results,<br />
whether or not a query was successful. Author, Title, and Recipe searches all allow the<br />
use of search strings, like “chocolate cake” (Search Help). They also allow searching by<br />
partial name. For instance, an author search for “Merritt” will retrieve a link to “Farmer,<br />
Fannie Merritt.” Keyword searching also works for <strong>book</strong>s and recipes. For example, a<br />
title search for “Creole” retrieves links to both Cooking in Old Creole Days and La<br />
Cuisine Creole. A recipe search for “Creole” retrieves links to titles such as “Entrecote<br />
Creole” (which is some sort of steak), “Sauce Creolle”, and “Baked Tomatoes, Creole<br />
Style.”
Pros and Cons<br />
Lew - 6<br />
The retrieval mechanism is adept at identifying words in the middle of a phrase, title, or<br />
word. This is useful, especially if the user can remember a useful keyword that happens<br />
to be not at the beginning of a title. However, this quality can be frustrating, as it may<br />
pick up keywords that are actually parts of other words, and might be unrelated. For<br />
instance, a search for “Pie” will return results for “Pumpkin Pie” (useful), as well as<br />
“Artichokes Stuffed with Meat (Carciofi ripieni di carne) (not useful).”<br />
Another problem with the Feeding America retrieval mechanism is that there is no<br />
thesaurus to funnel synonyms into a single search keyword. A search for crawfish<br />
returns recipes for “Ecrevisses en buisson. Boiled Crawfish”, “Ecrevisses Bordelaise.<br />
Crawfish a la Bordelaise”, “Oeufs brouisses aux queues d‟ecrevisse. Scrambled Eggs<br />
with Crawfish Tails”, “Omelette Celestine. Omelet with Crawfish or Lobsters”, “Bisque<br />
of Crab or Crawfish”, and “Crawfish Bisque”. Meanwhile, a search for crayfish turns up<br />
a completely different set of recipes: “Crayfish Bisque a la Creole”, “Crayfish Bisque. A<br />
Creole Dish”, “Crayfish Soup or Bisque”, “Crayfish-Broth for Purifying the Blood”, and<br />
“Crabs and Crayfish”. Separate searches for recipes using one ingredient with two names<br />
pulls up two separate sets of results for each name.<br />
The Feeding America search engine cannot interpret Boolean logic. A search for<br />
“‟chocolate‟ NOT „cake‟” rendered a syntax-error screen, while a search for “chocolate<br />
NOT cake” yielded a blank result screen.<br />
There is no way to do a generalized keyword search (i.e. across multiple fields). Even<br />
though each text has built-in metadata to mark its “value” content, or subject (or<br />
aboutness), the only way to find a resource by topic is to browse the list by topics of<br />
interest. These lists rely on hand-coding, not search results, to present relevant titles.<br />
Because of this, a user might be able to search for all <strong>book</strong>s with a certain keyword in the<br />
title, but would miss certain that share the same subject, but which do not have that<br />
keyword in the title. So, for instance, a user might search for “sweets” and come up with
Lew - 7<br />
Seventy-Five Receipts [sic] for Pastry, Cakes, and Sweetmeats, but miss Chocolate and<br />
Cocoa Recipes by Miss Perloa. The only way to find something by topic or subject is to<br />
use the Browse page. However, even the Browse-by-interest page for “Baking &<br />
Confectionary”, which includes the above two titles, does not include The Complete<br />
Confectioner. Relying on the Feeding America <strong>library</strong>‟s Search options to find an item<br />
based on subject may result in bad recall because of the lack of across-field searching, as<br />
well as the apparent lack of a thesaurus to funnel search terms into a controlled<br />
vocabulary. On the other hand, relying on the <strong>library</strong>‟s Browse pages means tolerating a<br />
certain amount of human error.<br />
It is odd that the FA <strong>library</strong> would not allow searching by topic, considering that its DTD<br />
allows for subject assignment at multiple levels of each <strong>book</strong>‟s structural hierarchy<br />
(chapter, section, sub-section, recipe, etc.). Similarly, the <strong>library</strong>‟s browse function does<br />
not take advantage of the aforementioned topic-driven metadata either, as the browse-byinterest<br />
lists are hand-coded.<br />
Comparison<br />
The University of Wisconsin Digital Collections‟ History of Science and Technology<br />
<strong>digital</strong> <strong>library</strong> is also a <strong>book</strong>-<strong>oriented</strong> <strong>digital</strong> <strong>library</strong>. Like Feeding America, History of<br />
Science creates both a “<strong>digital</strong> facsimile” of an original print <strong>book</strong>, as well as a<br />
transcribed full-text version using OCR. History of Science also follows Dublin Core<br />
standards, but uses TEI (a derivative of SGML) as its markup language. (Guidelines)<br />
The History of Science And Technology collection also offers browse and search options.<br />
Like Feeding America, the Browse option offers a list of available titles, each entry also<br />
serving as a link to a separate webpage.<br />
However, History of Science‟s search interface provides far more search options. As<br />
with Feeding America, keyword search for “crawfish” and “crayfish” renders two<br />
completely different sets of results. Unlike Feeding America, however, if the user is<br />
aware of both synonyms, she can compose a search that will retrieve results for both<br />
terms at once because History of Science allows Boolean searching using a graphical<br />
interface. History of Science also allows users to formulate proximity searches through a<br />
graphical interface, which Feeding America does not.<br />
One of History‟s drawbacks is that the collection holds at least two copies of any text<br />
(scanned images, and the transcribed text), only scanned versions are available to users<br />
via the website. I am confused by the decision to not make the transcribed text versions<br />
available to users if they exist.
University of Wisconsin Digital Collections’ History of Science and Technology proximity search interface<br />
Conclusion<br />
Lew - 8<br />
I was struck by how simple Feeding America’s search mechanisms are. Because I work<br />
mainly with <strong>library</strong> interfaces, I am accustomed to Boolean search capabilities, proximity<br />
searching, controlled vocabularies, and across-field searching. Feeding America provides<br />
none of these features. However, looking at the collection from the perspective of a<br />
<strong>library</strong> professional rather than a <strong>library</strong> user, it may have been wise to keep the<br />
collection’s search mechanisms basic. The collection, after all, is comprised of a mere 76<br />
works. Users might be able to browse and search the collection as it exists with a fair<br />
amount of success and accuracy. The site’s retrieval abilities might be improved by<br />
adding the above-mentioned features, but perhaps not enough to justify the institution<br />
investing the time and effort to build these mechanisms for such a small collection.
Works Cited<br />
“Digitization” Feeding America. 2004. Web. 22 Feb, 2011.<br />
http://<strong>digital</strong>.lib.msu.edu/projects/cook<strong>book</strong>s/html/project/project_digitization.html<br />
Lew - 9<br />
“Dublin Core Metadata Element Set, Version 1.1”. Dublin Core. 2011. Web. 22 Feb<br />
2011.<br />
http://www.dublincore.org/documents/dces/<br />
“Encoding Guidelines” Feeding America. 2004. Web. 22 Feb. 2011.<br />
< http://<strong>digital</strong>.lib.msu.edu/projects/cook<strong>book</strong>s/html/project/project_encoding.html><br />
“Feeding America Cook<strong>book</strong> .dtd”. Feeding America. 2004. Web. 22 Feb. 2011.<br />
<br />
“Feeding America Search Help.” Feeding America. 2004. Web. 22 Feb. 2011.<br />
http://<strong>digital</strong>.lib.msu.edu/projects/cook<strong>book</strong>s/html/searchhelp.html<br />
Gilliland, Anne J. “Setting the Stage.” Introduction to Metadata: Online, version 3.0.<br />
2008. Web. 22 Feb 2011.<br />
http://www.getty.edu/research/publications/electronic_publications/intrometadata/setting.<br />
html<br />
“Guidelines for Markup of Electronic Texts.” University of Wisconsin Digital<br />
Collections Center. 16 Feb 2007. Web. 22 Feb 2011. <<br />
http://uwdcc.<strong>library</strong>.wisc.edu/resources/etext/TEIGuidelines.shtml><br />
Longborne, Jan. “Introduction.” Feeding America. 2004. Web. 22 Feb. 2011.<br />
http://<strong>digital</strong>.lib.msu.edu/projects/cook<strong>book</strong>s/html/intro_essay.html