01.04.2015 Views

1FfUrl0

1FfUrl0

1FfUrl0

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Topic Modeling<br />

Alternatively, we can look at the least talked about topic:<br />

>>> words = model.show_topic(counts.argmin(), 64)<br />

The least talked about are the former French colonies in Central Africa. Just 1.5<br />

percent of documents touch upon it, and it represents 0.08 percent of the words.<br />

Probably if we had performed this exercise using the French Wikipedia, we would<br />

have obtained a very different result.<br />

Choosing the number of topics<br />

So far, we have used a fixed number of topics, which is 100. This was purely an<br />

arbitrary number; we could have just as well done 20 or 200 topics. Fortunately,<br />

for many users, this number does not really matter. If you are going to only use the<br />

topics as an intermediate step as we did previously, the final behavior of the system<br />

is rarely very sensitive to the exact number of topics. This means that as long as<br />

you use enough topics, whether you use 100 topics or 200, the recommendations<br />

that result from the process will not be very different. One hundred is often a good<br />

number (while 20 is too few for a general collection of text documents). The same<br />

is true of setting the alpha (α) value. While playing around with it can change the<br />

topics, the final results are again robust against this change.<br />

Topic modeling is often an end towards a goal. In that case, it is not<br />

always important exactly which parameters you choose. Different<br />

numbers of topics or values for parameters such as alpha will result in<br />

systems whose end results are almost identical.<br />

[ 86 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!