Notes on computational linguistics.pdf - UCLA Department of ...
Notes on computational linguistics.pdf - UCLA Department of ...
Notes on computational linguistics.pdf - UCLA Department of ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Stabler - Lx 185/209 2003<br />
8.1.5 Random variables<br />
(44) A random (or stochastic) variable <strong>on</strong> probability space (Ω, 2Ω ,P) is a functi<strong>on</strong> X : Ω → R.<br />
(45) Any set <strong>of</strong> numbers A ∈ 2R determines (or “generates”) an event, a set <strong>of</strong> outcomes, namely X−1 (A) =<br />
{e| X(e) ∈ A}.<br />
(46) So then, for example, P(X−1 (A)) = P({e| X(e) ∈ A}) is the probability <strong>of</strong> an event, as usual.<br />
(47) Many texts use the notati<strong>on</strong> X ∈ A for an event, namely, {e| X(e) ∈ A}).<br />
So P(X ∈ A) is just P(X−1 (A)), whichisjustP({e| X(e) ∈ A}).<br />
Sometimes you also see P{X ∈ A}, with the same meaning.<br />
(48) Similarly, for some a ∈ R, itiscomm<strong>on</strong>toseeP(X = s),whereX = s is the event {e| X(e) = s}).<br />
(49) The range <strong>of</strong> X is sometimes called the sample space <strong>of</strong> the stochastic variable X, ΩX.<br />
X is discrete if ΩX is finite or countably infinite. Otherwise it is c<strong>on</strong>tinuous.<br />
• Why do things this way? What is the purpose <strong>of</strong> these functi<strong>on</strong>s X?<br />
The answer is: the functi<strong>on</strong>s X just formalize the classificati<strong>on</strong> <strong>of</strong> events, the sets <strong>of</strong> outcomes that we are<br />
interested in, as explained in (45) and (48).<br />
This is a standard way to name events, and <strong>on</strong>ce you are practiced with the notati<strong>on</strong>, it is c<strong>on</strong>venient.<br />
The events are classified numerically here, that is, they are named by real numbers, but when the set <strong>of</strong> events<br />
ΩX is finite or countable, obviously we could name these events with any finite or countable set <strong>of</strong> names.<br />
8.1.6 Stochastic processes and n-gram models <strong>of</strong> language users<br />
(50) A stochastic process is a functi<strong>on</strong> X from times (or “indices”) to random variables.<br />
If the time is c<strong>on</strong>tinuous, then X : R → [Ω → R], where[Ω→R] is the set <strong>of</strong> random variables.<br />
If the time is discrete, then X : N → [Ω → R]<br />
(51) For stochastic processes X, instead <strong>of</strong> the usual argument notati<strong>on</strong> X(t), we use subscripts Xt, toavoid<br />
c<strong>on</strong>fusi<strong>on</strong> with the arguments <strong>of</strong> the random variables.<br />
So Xt is the value <strong>of</strong> the stochastic process X at time t, a random variable.<br />
When time is discrete, for t = 0, 1, 2,... we have the sequence <strong>of</strong> random variables X0,X1,X2,...<br />
(52) We will c<strong>on</strong>sider primarily discrete time stochastic processes, that is, sequences X0,X1,X2,...<strong>of</strong> random<br />
variables.<br />
So Xi is a random variable, namely the <strong>on</strong>e that is the value <strong>of</strong> the stochastic process X at time i.<br />
(53) Xi = q is interpreted as before as the event (now understood as occurring at time i) whichistheset<strong>of</strong><br />
outcomes {e| Xi(e) = q}.<br />
So, for example, P(Xi = q) is just a notati<strong>on</strong> for the probability, at time i, <strong>of</strong> an outcome that is named<br />
q by Xi, thatis,P(Xi = q) is short for P({e| Xi(e) = q}).<br />
(54) Notice that it would make perfect sense for all the variables in the sequence to be identical, X0 = X1 =<br />
X2 = .... In that case, we still think <strong>of</strong> the process as <strong>on</strong>e that occurs in time, with the same classificati<strong>on</strong><br />
<strong>of</strong> outcomes available at each time.<br />
Let’s call a stochastic process time-invariant (or stati<strong>on</strong>ary) iff all <strong>of</strong> its random variables are the same<br />
functi<strong>on</strong>. That is, for all q, q ′ ∈ N,Xq = Xq ′.<br />
(55) A finite stochastic process X is <strong>on</strong>e where sample space <strong>of</strong> all the stochastic variables,<br />
is finite.<br />
ΩX = ∞ i=0 ΩXi<br />
The elements <strong>of</strong> ΩX name events, as explained in (45) and (48), but in this c<strong>on</strong>text the elements <strong>of</strong> ΩX<br />
are <strong>of</strong>ten called states.<br />
Markov chains<br />
134