08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 4<br />

Corpus is just the preloaded list of words:<br />

>>> model = models.ldamodel.LdaModel(<br />

corpus,<br />

num_topics=100,<br />

id2word=corpus.id2word)<br />

This one-step process will build a topic model. We can explore the topics in<br />

many ways. We can see the list of topics a document refers to by using the<br />

model[doc] syntax:<br />

>>> topics = [model[c] for c in corpus]<br />

>>> print topics[0]<br />

[(3, 0.023607255776894751),<br />

(13, 0.11679936618551275),<br />

(19, 0.075935855202707139),<br />

(92, 0.10781541687001292)]<br />

I elided some of the output, but the format is a list of pairs (topic_index, topic_<br />

weight). We can see that only a few topics are used for each document. The topic<br />

model is a sparse model, as although there are many possible topics for each<br />

document, only a few of them are used. We can plot a histogram of the number<br />

of topics as shown in the following graph:<br />

[ 77 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!