01.01.2015 Views

Spotlight on Spotlight - Carol Smith Home Page

Spotlight on Spotlight - Carol Smith Home Page

Spotlight on Spotlight - Carol Smith Home Page

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Smith</strong> 8<br />

1. The target html page was opened within a web browser.<br />

2. Because a straight copy/paste routine would have captured undesirable informati<strong>on</strong><br />

and hyperlinks related extraneous to the message, each page was then reloaded via<br />

the page’s ‘Printer-friendly’ hyperlink.<br />

3. The message was copied in its entirety using cmd-a/cmd-c keyboard shortcuts<br />

(Macintosh).<br />

4. Using the cmd-v keyboard shortcut (Macintosh), the message was then pasted into a<br />

new plain text document, using Apple’s TextEdit applicati<strong>on</strong>.<br />

5. Two correcti<strong>on</strong>s were made to each plain text document:<br />

a. The phrase “Return to Message” (a hyperlink in the original page) was<br />

deleted from the end of each document.<br />

b. In order to avoid web crawler agents, the original html pages provide e-mail<br />

addresses in .gif format. For this reas<strong>on</strong>, each author’s e-mail address<br />

informati<strong>on</strong> needed to be entered manually.<br />

6. Each plain text file was then saved to the hard drive.<br />

7. A small percentage of message board postings were accompanied by .jpg<br />

attachments, typically scanned documents relating to the message. Each of these<br />

attachments (22 in all) was saved as separate data set files. Each attachment had to<br />

first be loaded into a separate browser window, for some unknown reas<strong>on</strong>,<br />

attachments could <strong>on</strong>ly be saved as .gif images without this extra step, even though<br />

the extensi<strong>on</strong> of the attachment indicated it was a .jpg file.<br />

After some c<strong>on</strong>siderati<strong>on</strong>, it was decided to name each text file sequentially, beginning with<br />

001, 002, 003, etc. If an initial message board posting received replies, each posting of a<br />

single thread were given the same number, but distinguished with sequential letters; e.g.,<br />

001a, 001b, 001c, etc… Some thought was given as to whether file names should indicate<br />

the level of depth in a particular thread; that is, if a posting was the sec<strong>on</strong>d reply to a reply of<br />

an initial posting, label it 001aab. This level of complexity was deemed unnecessary,<br />

however, as any thread in questi<strong>on</strong> could be easily located in its original web locati<strong>on</strong>, should<br />

the sequence of postings become of interest.<br />

Data Set Issues<br />

As described in the Functi<strong>on</strong>al Analysis secti<strong>on</strong> below, two decisi<strong>on</strong>s made during the<br />

creati<strong>on</strong> of the initial data set proved problematic, and required further data set modificati<strong>on</strong>:<br />

1. Because Mac files do not require extensi<strong>on</strong>s (.txt, .doc, etc.), extensi<strong>on</strong>s were not<br />

initially entered during the file-naming step.<br />

2. Documents were initially saved to separate sub-folders for each of the Internet<br />

message boards (i.e., “Ancestry-Minnick”; “Ancestry-Minick”; “Ancestry-Minck”;<br />

“Ancestry-Minnich”; “Ancestry-Minich”; “Ancestry-Mink”). Attachments were<br />

further segregated into folders within these folders, labeled “Ancestry-Minnick-<br />

Images”, etc. Finally, all six subfolders were c<strong>on</strong>tained within a single top-level folder<br />

labeled “<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> Data Set.”<br />

It should also be noted that the fielded format of the documents in their original web format

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!