18.02.2015 Views

Berry

Berry

Berry

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Memory-Based Reasoning and Collaborative Filtering 273<br />

MEASURING THE EFFECTIVENESS OF ASSIGNING CODES:<br />

RECALL AND PRECISION<br />

Recall and precision are two measurements that are useful for determining the<br />

appropriateness of a set of assigned codes or keywords. The case study on<br />

coding news stories, for instance, assigns many codes to news stories. Recall<br />

and precision can be used to evaluate these assignments.<br />

Recall answers the question: “How many of the correct codes did MBR<br />

assign to the story?” It is the ratio of codes assigned by MBR that are correct<br />

(as verified by editors) to the total number of correct codes on the story. If MBR<br />

assigns all available codes to every story, then recall is 100 percent because the<br />

correct codes all get assigned, along with many other irrelevant codes. If MBR<br />

assigns no codes to any story, then recall is 0 percent.<br />

Precision answers the question: “How many of the codes assigned by MBR<br />

were correct?” It is the percentage of correct codes assigned by MBR to the<br />

total number of codes assigned by MBR. Precision is 100 percent when MBR<br />

assigns only correct codes to a story. It is close to 0 percent when MBR assigns<br />

all codes to every story.<br />

Neither recall nor precision individually give the full story of how good the<br />

classification is. Ideally, we want 100 percent recall and 100 percent precision.<br />

Often, it is possible to trade off one against the other. For instance, using more<br />

neighbors increases recall, but decreases precision. Or, raising the threshold<br />

increases precision but decreases recall. Table 8.5 gives some insight into these<br />

measurements for a few specific cases.<br />

Table 8.5 Examples of Recall and Precision<br />

CODES BY MBR CORRECT CODES RECALL PRECISION<br />

A,B,C,D<br />

A,B<br />

A,B,C,D<br />

A,B,C,D<br />

100% 100%<br />

50% 100%<br />

A,B,C,D,E,F,G,H A,B,C,D<br />

100% 50%<br />

E,F A,B,C,D 0% 0%<br />

A,B,E,F<br />

A,B,C,D<br />

50% 50%<br />

The original codes assigned to the stories by individual editors had a recall<br />

of 83 percent and a precision of 88 percent with respect to the validated set of<br />

correct codes. For MBR, the recall was 80 percent and the precision 72 percent.<br />

However, Table 8.6 shows the average across all categories. MBR did<br />

significantly better in some of the categories.<br />

(continued)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!