20.07.2013 Views

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Stabler - Lx 185/209 2003<br />

8.1.5 Random variables<br />

(44) A random (or stochastic) variable <strong>on</strong> probability space (Ω, 2Ω ,P) is a functi<strong>on</strong> X : Ω → R.<br />

(45) Any set <strong>of</strong> numbers A ∈ 2R determines (or “generates”) an event, a set <strong>of</strong> outcomes, namely X−1 (A) =<br />

{e| X(e) ∈ A}.<br />

(46) So then, for example, P(X−1 (A)) = P({e| X(e) ∈ A}) is the probability <strong>of</strong> an event, as usual.<br />

(47) Many texts use the notati<strong>on</strong> X ∈ A for an event, namely, {e| X(e) ∈ A}).<br />

So P(X ∈ A) is just P(X−1 (A)), whichisjustP({e| X(e) ∈ A}).<br />

Sometimes you also see P{X ∈ A}, with the same meaning.<br />

(48) Similarly, for some a ∈ R, itiscomm<strong>on</strong>toseeP(X = s),whereX = s is the event {e| X(e) = s}).<br />

(49) The range <strong>of</strong> X is sometimes called the sample space <strong>of</strong> the stochastic variable X, ΩX.<br />

X is discrete if ΩX is finite or countably infinite. Otherwise it is c<strong>on</strong>tinuous.<br />

• Why do things this way? What is the purpose <strong>of</strong> these functi<strong>on</strong>s X?<br />

The answer is: the functi<strong>on</strong>s X just formalize the classificati<strong>on</strong> <strong>of</strong> events, the sets <strong>of</strong> outcomes that we are<br />

interested in, as explained in (45) and (48).<br />

This is a standard way to name events, and <strong>on</strong>ce you are practiced with the notati<strong>on</strong>, it is c<strong>on</strong>venient.<br />

The events are classified numerically here, that is, they are named by real numbers, but when the set <strong>of</strong> events<br />

ΩX is finite or countable, obviously we could name these events with any finite or countable set <strong>of</strong> names.<br />

8.1.6 Stochastic processes and n-gram models <strong>of</strong> language users<br />

(50) A stochastic process is a functi<strong>on</strong> X from times (or “indices”) to random variables.<br />

If the time is c<strong>on</strong>tinuous, then X : R → [Ω → R], where[Ω→R] is the set <strong>of</strong> random variables.<br />

If the time is discrete, then X : N → [Ω → R]<br />

(51) For stochastic processes X, instead <strong>of</strong> the usual argument notati<strong>on</strong> X(t), we use subscripts Xt, toavoid<br />

c<strong>on</strong>fusi<strong>on</strong> with the arguments <strong>of</strong> the random variables.<br />

So Xt is the value <strong>of</strong> the stochastic process X at time t, a random variable.<br />

When time is discrete, for t = 0, 1, 2,... we have the sequence <strong>of</strong> random variables X0,X1,X2,...<br />

(52) We will c<strong>on</strong>sider primarily discrete time stochastic processes, that is, sequences X0,X1,X2,...<strong>of</strong> random<br />

variables.<br />

So Xi is a random variable, namely the <strong>on</strong>e that is the value <strong>of</strong> the stochastic process X at time i.<br />

(53) Xi = q is interpreted as before as the event (now understood as occurring at time i) whichistheset<strong>of</strong><br />

outcomes {e| Xi(e) = q}.<br />

So, for example, P(Xi = q) is just a notati<strong>on</strong> for the probability, at time i, <strong>of</strong> an outcome that is named<br />

q by Xi, thatis,P(Xi = q) is short for P({e| Xi(e) = q}).<br />

(54) Notice that it would make perfect sense for all the variables in the sequence to be identical, X0 = X1 =<br />

X2 = .... In that case, we still think <strong>of</strong> the process as <strong>on</strong>e that occurs in time, with the same classificati<strong>on</strong><br />

<strong>of</strong> outcomes available at each time.<br />

Let’s call a stochastic process time-invariant (or stati<strong>on</strong>ary) iff all <strong>of</strong> its random variables are the same<br />

functi<strong>on</strong>. That is, for all q, q ′ ∈ N,Xq = Xq ′.<br />

(55) A finite stochastic process X is <strong>on</strong>e where sample space <strong>of</strong> all the stochastic variables,<br />

is finite.<br />

ΩX = ∞ i=0 ΩXi<br />

The elements <strong>of</strong> ΩX name events, as explained in (45) and (48), but in this c<strong>on</strong>text the elements <strong>of</strong> ΩX<br />

are <strong>of</strong>ten called states.<br />

Markov chains<br />

134

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!