13.11.2014 Views

Introduction to Computational Linguistics

Introduction to Computational Linguistics

Introduction to Computational Linguistics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

12. Finite State Au<strong>to</strong>mata 37<br />

Suppose that the first character is at position 1230. Then the string 129.023<br />

matches the first bracket and the string 145.110 the second bracket. These strings<br />

can be recalled using the function matched_group. It takes as input a number<br />

and the original string, and it returns the string of the nth matching bracket. So, if<br />

directly after the match on the string assigned <strong>to</strong> u we define<br />

(104)<br />

let s = "The first half of the IP address is<br />

"^(Str.matched_group 1 u)<br />

we get the following value for s:<br />

(105) "The first half of the IP address is 129.23<br />

To use this in an au<strong>to</strong>mated string replacement procedure, the variables \\0, \\1,<br />

\\2,..., \\9. After a successful match, \\0 is assigned <strong>to</strong> the entire string, \\1, <strong>to</strong><br />

the first matched string, \\2 <strong>to</strong> the second matched string, and so on. A template<br />

is a string that in place of characters also contains these variables (but nothing<br />

more). The function global_replace takes as input a regular expression, and<br />

two strings. The first string is used as a template. Whenever a match is found it<br />

uses the template <strong>to</strong> execute the replacement. For example, <strong>to</strong> cut the IP <strong>to</strong> its first<br />

half, we write the template "\\1". If we want <strong>to</strong> replace the original IP address<br />

by its first part followed by .0.1, then we use "\\.0.1". If we want <strong>to</strong> replace the<br />

second part by the first, we use "\\1.\\".<br />

12 Finite State Au<strong>to</strong>mata<br />

A finite state au<strong>to</strong>ma<strong>to</strong>n is a quintuple<br />

(106) A = 〈A, Q, i 0 , F, δ〉<br />

where A, the alphabet, is a finite set, Q, the set of states, also is a finite set,<br />

i 0 ∈ Q is the initial state, F ⊆ Q is the set of final or accepting states and,<br />

finally, δ ⊆ Q × A × Q is the transition relation. We write x → a y if 〈x, a, y〉 ∈ δ.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!