12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

where I <strong>is</strong> the identity matrix, A 6,,,””” ,,6Ny,) ¸ =>0+-,= , ,=iÌ¥0MNö¸¸What <strong>is</strong> the expected frequency <strong>of</strong> unigram ? Using the abbreviation Œ6|,199 9W9u 6 1 Lgtó,”,ö=,CHAPTER 7. -GRAMS FROM STOCHASTIC CONTEXT-FREE GRAMMARS 175in the formLet us examine these systems <strong>of</strong> equations one more time. Each can be written in matrix notationI L A0 c 6 b (7.6)c represents the vector <strong>of</strong> unknowns, 0 <strong>is</strong> a coefficient matrix, b 6)g0 . All <strong>of</strong> these are indexed by nonterminals )gÌ.(ë.We get0 <strong>is</strong> the right-hand side vector, and,>!Ì¥070 (7.7)6 y ”'’‘ž+-,+-,) ¸ (!0”'Ç‘ž+-,1) ¸ =>!0=>?y Xd10 (7.8)1 £££;(%ds0whereö,and 0 otherw<strong>is</strong>e. <strong>The</strong> expression L I A ar<strong>is</strong>es from bringing the variablesto the other side in equation (7.3) in order to collect the coefficients.(-.>0)g)=ë0¦6 1 if ) 6À=,(+-,> CTg(%d #1 £££2(X CWe can see that all dependencies on the particular bigram, ( , are in the right-hand side vector b,while the coefficient matrix I L A depends only on the grammar. Th<strong>is</strong>, together with the standard method<strong>of</strong> LU decomposition (see, e.g., Press et al. (1988)) enables us to solve for each bigram in time ð2 0 ,(ë.=ë0 and rather than the ð 30 standard for a full ( system being the number <strong>of</strong> nonterminals/variables). <strong>The</strong> LUdecomposition itself <strong>is</strong> cubic, but <strong>is</strong> incurred only once. <strong>The</strong> full computation <strong>is</strong> therefore dominated bythe quadratic effort <strong>of</strong> solving the system for each ¢ -gram. Furthermore, the quadratic cost <strong>is</strong> a worst-casefigure that would be incurred only if the grammar contained every possible rule; empirically th<strong>is</strong> computation<strong>is</strong> linear in the number <strong>of</strong> nonterminals, for grammars that are sparse, i.e., where each nonterminal makesreference only to a bounded number <strong>of</strong> other nonterminals (independent <strong>of</strong> the total grammar size).7.5 Cons<strong>is</strong>tency <strong>of</strong> SCFGsBlindly applying the ¢ -gram algorithm (and many others) to a SCFG with arbitrary probabilitiescan lead to surpr<strong>is</strong>ing results. Consider the following simple grammar1 ò(7.9)trówe see that)/.9%0 and equation 7.5,òiN¬0Ø60

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!