06.02.2013 Views

Case study: a book-oriented digital library

Case study: a book-oriented digital library

Case study: a book-oriented digital library

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Elizabeth Lew<br />

LIS 7410 / Spring 2011<br />

Dr. Wu<br />

elew1@tigers.lsu.edu<br />

Michigan State University Library’s Feeding America:<br />

A <strong>Case</strong> Study<br />

Feeding America is a <strong>library</strong> of 76 cook<strong>book</strong>s digitized from the Michigan State<br />

University Library‟s print collection. These 76 were carefully chosen from 7,000 <strong>book</strong>s<br />

related to culinary arts in the <strong>library</strong>‟s holdings, based on their historical significance<br />

(Longborne). Browsing the collection by publication date reveals that the oldest<br />

cook<strong>book</strong> in the collection dates from 1798(Amelia Simmons‟s American Cookery, and<br />

the newest from 1922 (Bertha M. Wood‟s Foods of the Foreign-Born in Relation To<br />

Health).<br />

Content Creation<br />

Once chosen, the 76 cook<strong>book</strong>s were digitized and transcribed. Those in good condition<br />

were scanned on a color flatbed scanner at 400 dpi, in 24-bit color, and compressed as<br />

.TIF files. Those in brittle condition were scanned by an overhead scanner and saved as<br />

600 dpi in 1-bit color .TIF files, though the FA‟s site does not specify whether these were<br />

compressed (Digitization). The .TIF masters were then converted to .JPG images of each<br />

scan, and PDF versions of the complete texts. Users can download these versions<br />

directly from the FA site. However, if users wish to acquire copies of the .TIF masters,<br />

they are available upon request.<br />

Typists then transcribed the texts from PDF using Note Tab software. According to the<br />

FA website, project overseers chose to use Note Tab because “it produces only ASCII<br />

text and has tag libraries which can be customized by editing one of the .clb files<br />

accompanying the program. A miniature tag <strong>library</strong> has been created for the cook<strong>book</strong>s,<br />

containing only page breaks, paragraphs, font styles, and an ISO-LAT1 character menu”<br />

(Digitization). The typists produced two copies of each text, which proofread against<br />

each other by using software to compare the two transcriptions.<br />

Metadata<br />

Once transcribed, the text of each cook<strong>book</strong> was then encoded using XML and a DTD<br />

formulated especially for the Feeding America project.<br />

The entire transcription file for a single cook<strong>book</strong> (including the element) is<br />

enclosed in the the wrapper element assigned to each individual work.


Lew - 2<br />

Attributes for this element include: type (example values are: famous, restaurant, charity),<br />

chefschool, histperiod, region, ethnicgroup, occasion (example values include: weddings,<br />

Thanksgiving, winter, etc.). (Encoding)<br />

Project overseers decided to use Dublin Core standards to organize resource description<br />

for each work. This description, based on Dublin Core’s 11 elements, is located at the<br />

beginning of the file, before the cook<strong>book</strong>’s body text, and marked by the tag.<br />

(Encoding) Dublin Core elements include: contributor, coverage (“The spatial or<br />

temporal topic of the resource”), creator, date, description, format, identifier, language,<br />

publisher, relation, rights, source, subject, title, and type. (Dublin Core)<br />

Other metadata is located throughout the cook<strong>book</strong>‟s body text, marked by the <br />

element. The Feeding America DTD also allows for , , and<br />

elements under the element. Beneath these elements, <br />

and elements can be used for edibles and non-edibles, respectively. Still<br />

further, the FA DTD also provides for sub-elements such as (to mark<br />

uncommon processes mentioned in the text such as “braise” or “poach”; “stir” and<br />

“roast” are too common to note), , and (these would mark<br />

textual mention of uncommon tools such as “ramekin” or “double boiler”, but not<br />

“spoon” or “dish”). (Encoding) Elements that describe the herarchy of sections of each<br />

cook<strong>book</strong> from the largest to smallest can be described by “class1”<br />

and “class2” attributes, which use defined values (like a very small controlled vocabulary<br />

thesaurus) to describe types of food. Examples of possible values include:<br />

“eggcheesedairy”, “soups”, “fruitvegbeans”, and “meatfishgame”. (DTD)<br />

Interface and Pages<br />

The Feeding America portal displays links to the interactive parts of the website in a<br />

horizontal tool bar laid out along the top of the page. Below it, news and updates, along<br />

with sample images from the collection, fill the main portion, and right side of the page,<br />

respectively.<br />

The top tool bar is comprised of 8 links: Home, The Project, The Authors, Browse the<br />

Collection, Search the Collection, Museum Objects, Glossary, and Partners.


Lew - 3<br />

The Project provides a description of the Feeding America project, where much of the<br />

background information in this paper was gathered. This area describes the scope and<br />

nature of the collection, as well as the process of digitizing, coding, and building the<br />

Feeding America portal. This section also provides press links and a 15-minute video<br />

describing the collection and why some of the 76 <strong>book</strong>s were chosen to be digitized.<br />

Browsing<br />

The Authors, Browse, Museum Objects, and Glossary pages are all presented similarly:<br />

as menus, with a brief description of the contents of the list at the top of the page, and<br />

directly beneath, the alphabet laid out as a menu, with each letter linking to the section of<br />

the list body that corresponds with that letter. Below the alphabet menu is the beginning<br />

of the alphabetized list, which can be entirely scrolled through as well.<br />

The Glossary is a list of culinary terms, and does not provide opportunity for much<br />

further interaction. The definition of each term is given directly after the alphabetized<br />

entry, and only a few entries link to a different page.<br />

Each entry listed in the Browse, Authors, and Museum Objects Pages all provide links to<br />

unique pages for each entry. The author menu allows users to browse an alphabetical list<br />

of cook<strong>book</strong> authors. Each entry is also a link to a unique page created for that author,<br />

containing a reference to the cook<strong>book</strong> she/he authored, as well as a brief biography. The<br />

Objects page allows users to browse an alphabetical list of cooking instruments housed in<br />

the Michigan State University Museum. Likewise, each entry is also a link to a unique<br />

page created for each instrument. Object pages include a photo of the utensil as well as a


description of the object‟s history and purpose.<br />

Author browse menu<br />

Object browse menu<br />

Lew - 4<br />

The Browse page is devoted to the cook<strong>book</strong>s. Initially, the list of entries (each<br />

formatted as cook<strong>book</strong> title, author, and date of publication) is arranged alphabetically.<br />

However, there are links at the top of the page to view versions of the list arranged by<br />

date, and by interest. Arranged by date, the list is broken up into groups of three-year<br />

intervals. Arranged by interest, the list is broken up into groups such as “Diet, nutrition,<br />

health, health movements, vegetarianism; Charity and Church cook<strong>book</strong>s; Hotels,<br />

restaurants, etc; Markets and produce; Military cooking; Regional American Cooking:<br />

Midwest, Northeast, South and Border states, and West; Ethnic Influences: African,<br />

Asian, Creole, English, French, and so on. Each of these interest pages in turn pulls up a<br />

short list of related titles. These lists are hand-coded, and not sets of retrieved results.


Searching<br />

Lew - 5<br />

The Search Page offers 4 options for searching: Book Author, Book Title, Recipe Name,<br />

and Ingredients. The search function does not provide an option to browse results,<br />

whether or not a query was successful. Author, Title, and Recipe searches all allow the<br />

use of search strings, like “chocolate cake” (Search Help). They also allow searching by<br />

partial name. For instance, an author search for “Merritt” will retrieve a link to “Farmer,<br />

Fannie Merritt.” Keyword searching also works for <strong>book</strong>s and recipes. For example, a<br />

title search for “Creole” retrieves links to both Cooking in Old Creole Days and La<br />

Cuisine Creole. A recipe search for “Creole” retrieves links to titles such as “Entrecote<br />

Creole” (which is some sort of steak), “Sauce Creolle”, and “Baked Tomatoes, Creole<br />

Style.”


Pros and Cons<br />

Lew - 6<br />

The retrieval mechanism is adept at identifying words in the middle of a phrase, title, or<br />

word. This is useful, especially if the user can remember a useful keyword that happens<br />

to be not at the beginning of a title. However, this quality can be frustrating, as it may<br />

pick up keywords that are actually parts of other words, and might be unrelated. For<br />

instance, a search for “Pie” will return results for “Pumpkin Pie” (useful), as well as<br />

“Artichokes Stuffed with Meat (Carciofi ripieni di carne) (not useful).”<br />

Another problem with the Feeding America retrieval mechanism is that there is no<br />

thesaurus to funnel synonyms into a single search keyword. A search for crawfish<br />

returns recipes for “Ecrevisses en buisson. Boiled Crawfish”, “Ecrevisses Bordelaise.<br />

Crawfish a la Bordelaise”, “Oeufs brouisses aux queues d‟ecrevisse. Scrambled Eggs<br />

with Crawfish Tails”, “Omelette Celestine. Omelet with Crawfish or Lobsters”, “Bisque<br />

of Crab or Crawfish”, and “Crawfish Bisque”. Meanwhile, a search for crayfish turns up<br />

a completely different set of recipes: “Crayfish Bisque a la Creole”, “Crayfish Bisque. A<br />

Creole Dish”, “Crayfish Soup or Bisque”, “Crayfish-Broth for Purifying the Blood”, and<br />

“Crabs and Crayfish”. Separate searches for recipes using one ingredient with two names<br />

pulls up two separate sets of results for each name.<br />

The Feeding America search engine cannot interpret Boolean logic. A search for<br />

“‟chocolate‟ NOT „cake‟” rendered a syntax-error screen, while a search for “chocolate<br />

NOT cake” yielded a blank result screen.<br />

There is no way to do a generalized keyword search (i.e. across multiple fields). Even<br />

though each text has built-in metadata to mark its “value” content, or subject (or<br />

aboutness), the only way to find a resource by topic is to browse the list by topics of<br />

interest. These lists rely on hand-coding, not search results, to present relevant titles.<br />

Because of this, a user might be able to search for all <strong>book</strong>s with a certain keyword in the<br />

title, but would miss certain that share the same subject, but which do not have that<br />

keyword in the title. So, for instance, a user might search for “sweets” and come up with


Lew - 7<br />

Seventy-Five Receipts [sic] for Pastry, Cakes, and Sweetmeats, but miss Chocolate and<br />

Cocoa Recipes by Miss Perloa. The only way to find something by topic or subject is to<br />

use the Browse page. However, even the Browse-by-interest page for “Baking &<br />

Confectionary”, which includes the above two titles, does not include The Complete<br />

Confectioner. Relying on the Feeding America <strong>library</strong>‟s Search options to find an item<br />

based on subject may result in bad recall because of the lack of across-field searching, as<br />

well as the apparent lack of a thesaurus to funnel search terms into a controlled<br />

vocabulary. On the other hand, relying on the <strong>library</strong>‟s Browse pages means tolerating a<br />

certain amount of human error.<br />

It is odd that the FA <strong>library</strong> would not allow searching by topic, considering that its DTD<br />

allows for subject assignment at multiple levels of each <strong>book</strong>‟s structural hierarchy<br />

(chapter, section, sub-section, recipe, etc.). Similarly, the <strong>library</strong>‟s browse function does<br />

not take advantage of the aforementioned topic-driven metadata either, as the browse-byinterest<br />

lists are hand-coded.<br />

Comparison<br />

The University of Wisconsin Digital Collections‟ History of Science and Technology<br />

<strong>digital</strong> <strong>library</strong> is also a <strong>book</strong>-<strong>oriented</strong> <strong>digital</strong> <strong>library</strong>. Like Feeding America, History of<br />

Science creates both a “<strong>digital</strong> facsimile” of an original print <strong>book</strong>, as well as a<br />

transcribed full-text version using OCR. History of Science also follows Dublin Core<br />

standards, but uses TEI (a derivative of SGML) as its markup language. (Guidelines)<br />

The History of Science And Technology collection also offers browse and search options.<br />

Like Feeding America, the Browse option offers a list of available titles, each entry also<br />

serving as a link to a separate webpage.<br />

However, History of Science‟s search interface provides far more search options. As<br />

with Feeding America, keyword search for “crawfish” and “crayfish” renders two<br />

completely different sets of results. Unlike Feeding America, however, if the user is<br />

aware of both synonyms, she can compose a search that will retrieve results for both<br />

terms at once because History of Science allows Boolean searching using a graphical<br />

interface. History of Science also allows users to formulate proximity searches through a<br />

graphical interface, which Feeding America does not.<br />

One of History‟s drawbacks is that the collection holds at least two copies of any text<br />

(scanned images, and the transcribed text), only scanned versions are available to users<br />

via the website. I am confused by the decision to not make the transcribed text versions<br />

available to users if they exist.


University of Wisconsin Digital Collections’ History of Science and Technology proximity search interface<br />

Conclusion<br />

Lew - 8<br />

I was struck by how simple Feeding America’s search mechanisms are. Because I work<br />

mainly with <strong>library</strong> interfaces, I am accustomed to Boolean search capabilities, proximity<br />

searching, controlled vocabularies, and across-field searching. Feeding America provides<br />

none of these features. However, looking at the collection from the perspective of a<br />

<strong>library</strong> professional rather than a <strong>library</strong> user, it may have been wise to keep the<br />

collection’s search mechanisms basic. The collection, after all, is comprised of a mere 76<br />

works. Users might be able to browse and search the collection as it exists with a fair<br />

amount of success and accuracy. The site’s retrieval abilities might be improved by<br />

adding the above-mentioned features, but perhaps not enough to justify the institution<br />

investing the time and effort to build these mechanisms for such a small collection.


Works Cited<br />

“Digitization” Feeding America. 2004. Web. 22 Feb, 2011.<br />

http://<strong>digital</strong>.lib.msu.edu/projects/cook<strong>book</strong>s/html/project/project_digitization.html<br />

Lew - 9<br />

“Dublin Core Metadata Element Set, Version 1.1”. Dublin Core. 2011. Web. 22 Feb<br />

2011.<br />

http://www.dublincore.org/documents/dces/<br />

“Encoding Guidelines” Feeding America. 2004. Web. 22 Feb. 2011.<br />

< http://<strong>digital</strong>.lib.msu.edu/projects/cook<strong>book</strong>s/html/project/project_encoding.html><br />

“Feeding America Cook<strong>book</strong> .dtd”. Feeding America. 2004. Web. 22 Feb. 2011.<br />

<br />

“Feeding America Search Help.” Feeding America. 2004. Web. 22 Feb. 2011.<br />

http://<strong>digital</strong>.lib.msu.edu/projects/cook<strong>book</strong>s/html/searchhelp.html<br />

Gilliland, Anne J. “Setting the Stage.” Introduction to Metadata: Online, version 3.0.<br />

2008. Web. 22 Feb 2011.<br />

http://www.getty.edu/research/publications/electronic_publications/intrometadata/setting.<br />

html<br />

“Guidelines for Markup of Electronic Texts.” University of Wisconsin Digital<br />

Collections Center. 16 Feb 2007. Web. 22 Feb 2011. <<br />

http://uwdcc.<strong>library</strong>.wisc.edu/resources/etext/TEIGuidelines.shtml><br />

Longborne, Jan. “Introduction.” Feeding America. 2004. Web. 22 Feb. 2011.<br />

http://<strong>digital</strong>.lib.msu.edu/projects/cook<strong>book</strong>s/html/intro_essay.html

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!