08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

By iterating the above formulas we can arrive at a local optimum for the HMM parameters<br />

a i,j and b j (O k ).<br />

9.3 Graphical Models, and Belief Propagation<br />

A graphical model is a compact representation <strong>of</strong> a function <strong>of</strong> n variables x 1 , x 2 , . . . , x n .<br />

It consists <strong>of</strong> a graph, directed or undirected, whose vertices correspond to variables that<br />

take on values from some set. In this chapter, we consider the case where the function<br />

is a probability distribution and the set <strong>of</strong> values the variables take on is finite, although<br />

graphical models are <strong>of</strong>ten used to represent probability distributions with continuous variables.<br />

The edges <strong>of</strong> the graph represent relationships or constraints between the variables.<br />

The directed model represents a joint probability distribution that factors into a product<br />

<strong>of</strong> conditional probabilities.<br />

p (x 1 , x 2 , . . . , x n ) =<br />

n∏<br />

p (x i |parents <strong>of</strong> x i )<br />

i=1<br />

It is assumed that the directed graph is acyclic. The directed graphical model is called<br />

a Bayesian or belief network and appears frequently in the artificial intelligence and the<br />

statistics literature.<br />

The undirected graph model, called a Markov random field, can also represent a joint<br />

probability distribution <strong>of</strong> the random variables at its vertices. In many applications the<br />

Markov random field represents a function <strong>of</strong> the variables at the vertices which is to be<br />

optimized by choosing values for the variables.<br />

A third model called the factor model is akin to the Markov random field, but here the<br />

dependency sets have a different structure. In the following sections we describe all these<br />

models in more detail.<br />

9.4 Bayesian or Belief Networks<br />

A Bayesian network is a directed acyclic graph where vertices correspond to variables<br />

and a directed edge from y to x represents a conditional probability p(x|y). If a vertex x<br />

has edges into it from y 1 , y 2 , . . . , y k , then the conditional probability is p (x | y 1 , y 2 , . . . , y k ).<br />

The variable at a vertex with no in edges has an unconditional probability distribution.<br />

If the value <strong>of</strong> a variable at some vertex is known, then the variable is called evidence.<br />

An important property <strong>of</strong> a Bayesian network is that the joint probability is given by the<br />

product over all nodes <strong>of</strong> the conditional probability <strong>of</strong> the node conditioned on all its<br />

immediate predecessors.<br />

In the example <strong>of</strong> Fig. 9.1, a patient is ill and sees a doctor. The doctor ascertains<br />

the symptoms <strong>of</strong> the patient and the possible causes such as whether the patient was in<br />

308

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!