12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

²,+666²N²²²²+NNNNN,+++N£?+ 20+N{£££+L+?01£+CHAPTER 6. EFFICIENT PARSING WITH STOCHASTIC CONTEXT-FREE GRAMMARS 1596.6.3 Efficient parsing with large sparse grammarsDuring work with a moderate-sized, application-specific natural language grammar taken from theBeRP system (Jurafsky et al. 1994b) we had opportunity to optimize our implementation <strong>of</strong> the algorithm.Below we relate some <strong>of</strong> the lessons learned in the process.6.6.3.1 Speeding up matrix inversionsfrom a matrix + ,Both prediction and completion steps make use <strong>of</strong> a matrix © defined as a geometric series derived©É6,2²+ 2N £££6Both © and are indexed by the nonterminals in the grammar. <strong>The</strong> matrix <strong>is</strong> derived from the SCFG rules+ +and probabilities (either the left-corner relation or the unit-production relation).For an application using a fixed grammar the time taken by the precomputation <strong>of</strong> left-corner andunit-productionmatrices may not be crucial, since it occurs <strong>of</strong>f-line. <strong>The</strong>re are cases, however, when that costshould be minimal, e.g., when rule probabilities are iteratively reestimated.Even if the matrix +<strong>is</strong> sparse, the matrix inversion can be prohibitive for large numbers <strong>of</strong>nonterminals ¢ . Empirically, matrices <strong>of</strong> rank ¢ with a bounded number t <strong>of</strong> non-zero entries in each row(i.e., t <strong>is</strong> independent <strong>of</strong> ¢ ) can be inverted in time ð20 , whereas a full matrix <strong>of</strong> size ¢ ¾g¢ would require¢time ð3 0 ¢ .In many cases the grammar has a relatively small number <strong>of</strong> nonterminals that have productionsinvolving other nonterminals in a left-corner (or the RHS <strong>of</strong> a unit-production). Only those nonterminalscan have non-zero contributions to the higher powers <strong>of</strong> the matrix + . Th<strong>is</strong> fact can be used to substantiallyreduce the cost <strong>of</strong> the matrix inversion needed to compute © .Let + Ù be a subset <strong>of</strong> the entries <strong>of</strong> + , namely, only those elements indexed by nonterminals thathave a non-empty row in + . For example, for the left-corner computation, + Ù <strong>is</strong> obtained from + by deletingall rows and columns indexed by nonterminals that do not have productions starting with nonterminals. LetÙ be the identity matrix over the same set <strong>of</strong> nonterminals as + Ù . <strong>The</strong>n © can be computed as,2²© 6Ù NÙ NÙ 2N|£££ 0V+,2²,2²Ù LÙ 01+augmented with zero elements to match the dimensions <strong>of</strong> the right operand, + .<strong>The</strong> speedups obtained with th<strong>is</strong> technique can be substantial. For a grammar with 789 nonterminals,<strong>of</strong> which only 132 have nonterminal productions, the left-corner matrix was computed in 12 seconds (includingthe final multiply with + and addition <strong>of</strong> ² ). Inversion <strong>of</strong> the full matrix ² Ltook 4 minutes 28 seconds. 1717 <strong>The</strong>se figures are not very meaningful for their absolute values. All measurements were obtained on a Sun SPARCstation 2 with aN© Ù+Here ©Ù <strong>is</strong> the inverse <strong>of</strong> ² ÙÁLÙ , and denotes a matrix multiplication in which the left operand <strong>is</strong> first

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!