10.11.2016 Views

Learning Data Mining with Python

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The result is shown in the next graph. It can be quite clearly seen that the main<br />

source of error is U being mistaken for an H nearly every single time!<br />

Chapter 8<br />

The letter U appears in 17 percent of words in our list. For each word that a U<br />

appears in, we can expect this to be wrong. U actually appears more often than H<br />

(which is in around 11 percent of words), indicating we could get a cheap (although<br />

possibly not a robust) boost in accuracy by changing any H prediction into a U.<br />

In the next section, we will do something a bit smarter and actually use the<br />

dictionary to search for similar words.<br />

[ 179 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!