Introduction to Computational Linguistics

More documents

Recommendations

Info

14. Digression: Time Complexity 54 two blocks of four into one block of eight, and so on. How many comparisons are needed to merge two ordered lists M and N of length m and n, respectively, into a list L of length m + n? The answer is: m + n − 1. The idea is as follows. We take the first elements, M 0 and N 0 . Then L 0 is the smaller of the two, which is then removed from its list. The next element is again obtained by comparing the first elements of the lists, and so on. For example, let M = [1, 3, 5] and N = [2, 4, 5]. We compare the first two elements. The smaller one is put into L: the lists are now (137) M = [3, 5], N = [2, 4, 5], L = [1] Now, we compare the first elements of M and N. The smallest element is 2 and is put into L: (138) M = [3, 5], N = [4, 5], L = [1, 2] Now, this is how the algorithm goes on: (139) (140) (141) (142) M = [5], N = [4, 5], L = [1, 2, 3] M = [5], N = [5], L = [1, 2, 3, 4] M = [], N = [5], L = [1, 2, 3, 4, 5] M = [], N = [], L = [1, 2, 3, 4, 5, 5] Each time we put an element into L we need just one comparison, except for the last element, which can be put in without further checking. If we want to avoid repetitions then we need to check each element against the last member of the list before putting it in (this increases the number of checks by n + m − 1). In the first step we have n/2 many blocks, and n/4 many comparisons are being made to order them. The next step takes 3n/4 comparisons, the third step needs 7n/8, and so on. Let us round the numbers somewhat: each time we need certainly less than n comparisons. How often do we have to merge? This number is log 2 n. This is the number x such that 2 x = n. So we need in total n log 2 n many steps. We show this number in comparison to n and n 2 . (143) 2 0 2 4 2 8 2 12 2 16 n 1 16 256 4096 65536 n log 2 n 0 64 2048 59152 1048576 n 2 /2 1/2 128 32768 8388608 2147483648
15. Finite State Transducers 55 Consider again your computer. On an input of length 65536 (= 2 16 ) it takes one second under the algorithm just described, while the naive algorithm would require it run for 2159 seconds, which is more than half an hour. In practice, one does not want to spell out in painful detail how many steps an algorithm consumes. Therefore, simplifying notation is used. One writes that a problem is in O(n) if there is a constant C such that from some n 0 on for an input of length n the algorithm takes C · n steps to compute the solution. (One says that the estimate holds for ‘almost all’ inputs if it holds only from a certain point onwards.) This notation makes sense also in view of the fact that it is not clear how much time an individual step takes, so that the time consumption cannot not really be measured in seconds (which is what is really of interest for us). If tomorrow computers can compute twice as fast, everything runs in shorter time. Notice that O(bn + a) = O(bn) = O(n). It is worth understanding why. First, assume that n ≥ a. Then (b + 1)n ≥ bn + n ≥ bn + a. This means that for almost all n: (b+1)n ≥ bn+a. Next, O((b+1)n) = O(n), since O((b+1)n) effectively means that there is a constant C such that for almost all n the complexity is ≤ C(b + 1)n. Now put D := C(b + 1). Then there is a constant (namely D) such that for almost all n the complexity is ≤ Dn. Hence the problem is in O(n). Also O(cn 2 + bn + a) = O(n 2 ) and so on. In general, the highest exponent wins by any given margin over the others. Polynomial complexity is therefore measured only in terms of the leading exponent. This makes calculations much simpler. 15 Finite State Transducers Finite state transducers are similar to finite state automata. You think of them as finite state automata that leave a trace of their actions in the form of a string. However, the more popular way is to think of them as translation devices with finite memory. A finite state transducer is a sextuple (144) T = 〈A, B, Q, i 0 , F, δ〉 where A and B are alphabets, Q a finite set (the set of states), i 0 the initial state, F the set of final states and (145) δ ⊆ ℘(A ε × Q × B ε × Q)
Page 1 and 2:
Introduction to Computational Lingu
Page 3 and 4: 2. Practical Remarks Concerning OCa
Page 5 and 6: 3. Welcome To The Typed Universe 5
Page 11 and 12: 4. Function Definitions 11 4 Functi
Page 13 and 14: 5. Modules 13 can find out why the
Page 15 and 16: 5. Modules 15 let length l = length
Page 17 and 18: 6. Sets and Functors 17 write it do
Page 19 and 20: 7. Hash Tables 19 to actually see t
Page 21 and 22: 8. Combinators 21 can be used on an
Page 23 and 24: 9. Objects and Methods 23 of type
Page 25 and 26: 10. Characters, Strings and Regular
Page 31 and 32: 11. Interlude: Regular Expressions
Page 37 and 38: 12. Finite State Automata 37 Suppos
Page 39 and 40: 12. Finite State Automata 39 There
Page 41 and 42: 12. Finite State Automata 41 The pr
Page 43 and 44: 12. Finite State Automata 43 Proof.
Page 45 and 46: 13. Complexity and Minimal Automata
Page 51 and 52: 14. Digression: Time Complexity 51
Page 53: 14. Digression: Time Complexity 53
Page 57 and 58: 15. Finite State Transducers 57 And
Page 59 and 60: 16. Finite State Morphology 59 Then
Page 61 and 62: 17. Using Finite State Transducers
Page 67 and 68: 18. Context Free Grammars 67 18 Con
Page 69 and 70: 18. Context Free Grammars 69 symbol
Page 71 and 72: 18. Context Free Grammars 71 which
Page 73 and 74: 19. Parsing and Recognition 73 Z
Page 75 and 76: 19. Parsing and Recognition 75 numb
Page 77 and 78: 20. Greibach Normal Form 77 The red
Page 79 and 80: 20. Greibach Normal Form 79 Definit
Page 81 and 82: 20. Greibach Normal Form 81 Proposi
Page 83 and 84: 20. Greibach Normal Form 83 nonterm
Page 85 and 86: 21. Pushdown Automata 85 pushdown.
Page 87 and 88: 21. Pushdown Automata 87 machine is
Page 89 and 90: 22. Shift-Reduce-Parsing 89 differe
Page 91 and 92: 23. Some Metatheorems 91 If the loo
Page 93 and 94: 23. Some Metatheorems 93 This proof
Page 95 and 96: 23. Some Metatheorems 95 a decompos
show all

Introduction to Computational Linguistics

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?