Nonparametric Bayesian Discrete Latent Variable Models for ...

More documents

Recommendations

Info

4 Indian Buffet Process Models P+ P R R+ Figure 4.15: Random binary feature matrix samples weighted with the feature weights from the chain for the Paris-Rome example. The gray level indicates the weight of each feature. Rows correspond to options and columns to features. Only the active features are shown. It is inferred that there are features that only P and P + share, and only R and R+ share. Note that there are also some features common to all options which do not affect the likelihood. The first four columns in these figures are the unique features, remaining columns are from the posterior of the infinite part. Note that the weights of the unique features for P and R are very close to zero, which is expected since they are not supported by the data. The unique features for P + and R+ have small weights representing the small bonus. representations may result in the same choice probabilities. The important point is that the model can learn the true choice probabilities by inferring that there are features that only P and P + share, and there are features that only R and R+ share. The small bonus that R+ and P + have in common is represented as two different features with similar weights. Models on a larger set of alternatives might result in feature representations that cannot be interpreted easily, as will be seen in the next example. Celebrities As a second example we analyze real choice data from an experiment by Rumelhart and Greeno (1971) that is known to exhibit choice patterns that the BTL model cannot capture. Subjects were presented with pairs of celebrities and were asked with whom they would prefer to spend an hour of conversation. There are nine celebrities in the choice set that were chosen from three groups: three politicians, three athletes and three movie stars. Individuals within each group are assumed to be more similar to each other and this should have an effect on the choice probabilities beyond the variation that can be captured by the BTL model. The data had been collected assuming that the choice probabilities could be fully captured by a feature matrix that has unique features for each individual plus one feature for being a politician, one feature for being an athlete and one feature for being a movie star. The assumed feature matrix is shown in Figure 4.16. This feature matrix can also be depicted as a tree therefore we refer to this model as the tree EBA model (tEBA) (Tversky and Sattath, 1979). We modeled the choice data with different specifications for the EBA model: the model which assumes all options to have only unique features (BTL), the EBA model with fixed features of tree structure (tEBA), two finite EBA models with the number of features fixed to be 12 and 15 (EBA12 and EBA15), and the EBA model with infinitely 108
P1 P2 P3 A1 A2 A3 M1 M2 4.5 A Choice Model with Infinitely Many Latent Features politician athlete movie star M3 P1 P2 P3 A1 A2 A3 M1 M2 M3 Figure 4.16: Representation of the assumed features for the celebrities data. The tree structure on the right shows nine celebrities of three professions. Each individual is assumed to have a unique feature and one feature that he shares with the other celebrities that have the same profession. The left panel shows this assumed feature matrix which is used for training the tEBA model. many latent features (iEBA). We compare the predictive performance of each model using a leave-one-out scenario. We train the models on the data of all possible pairwise comparisons except one pair. We then predict the choice probability for the pair that was left out and repeat this procedure for all pairs. We evaluate the performance of the models using the loss function L( ˆ θ, θ) = −θlog ˆ θ − (1 − θ)log(1 − ˆ θ) which expresses the discrepancy between the true probabilities θ and the predicted probabilities ˆ θ. The mean of the predicted probabilities minimizes the expected loss. Therefore we take this as a point estimate and report the negative log likelihood on the mean of the predictive distribution for each pair. The negative log likelihood of each model averaged over the 36 pairs is shown in Table 4.2. For better comparison we also report the values for a baseline model that always predicts 0.5 and the upper bound that could be reached by predicting the empirical probabilities exactly. Furthermore, to see how much information we gain over the baseline model by using the different models we report an information score for each single paired comparison: The negative log likelihood averaged over the pairs and number of comparisons in bits with the baseline model subtracted. The choice set was designed with the tree structure in mind. As expected, the BTL model cannot capture as much information as the other models. The iEBA and the tEBA models have the best predictive performance on average. This shows that the underlying structure could be successfully represented by the infinite model. However, we cannot observe the tree structure in the latent features that are found by the iEBA. Note that different feature representations can lead to the same choice probabilities. The mean number of active features for the iEBA model for different pairs is between 30 and 50—much more than the number of features in tEBA. This explains why the average performance of EBA12 and EBA15 is worse than that of iEBA and tEBA even though they could implement the tree structure in principle. As we cannot know how 109
Page 1:
Nonparametric Bayesian Discrete Lat
Page 4 and 5:
Matrizen mit unendlich vielen Spalt
Page 7 and 8:
Contents Zusammenfassung iii Abstra
Page 9:
List of Algorithms 1 Gibbs sampling
Page 13 and 14:
Notation Matrices are capitalized a
Page 15:
Symbol Meaning IBP Z binary latent
Page 18 and 19:
1 Introduction belief in the prior.
Page 20 and 21:
2 Nonparametric Bayesian Analysis b
Page 22 and 23:
2 Nonparametric Bayesian Analysis s
Page 25 and 26:
3 Dirichlet Process Mixture Models
Page 27 and 28:
3.1 The Dirichlet Process the perfo
Page 29 and 30:
α G o G θi x i N 3.1 The Dirichle
Page 31 and 32:
15 10 5 −0.5 0 0.5 2 1 G 0 0 −0
Page 33 and 34:
increment process with the correspo
Page 35 and 36:
α G o π k c i θk x i 8 N 3.1 The
Page 37 and 38:
3.1 The Dirichlet Process Eq. (3.21
Page 39 and 40:
number of components, K number of c
Page 41 and 42:
3.2 MCMC Inference in Dirichlet Pro
Page 43 and 44:
and Bush and MacEachern (1996). 3.2
Page 45 and 46:
3.2.2 Algorithms for non-Conjugate
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
∗ π ∗ π s 3.2 MCMC Inference
Page 57 and 58:
model can be written in the form of
Page 59 and 60:
−1 µ y Σy Σy D ξ normal R 3.3
Page 61 and 62:
the log likelihood term is: where a
Page 63 and 64:
3.3 Empirical Study on the Choice o
Page 65 and 66:
autocovariance coefficient 1 0.8 0.
Page 67 and 68:
# of data points # of data points 5
Page 69 and 70:
3.4 Dirichlet Process Mixtures of F
Page 71 and 72:
µ y Σy ξ R 0 ν w normal µ −1
Page 73 and 74: ch1 ch2 ch3 ch4 3.4 Dirichlet Proce
Page 75 and 76: 3.4 Dirichlet Process Mixtures of F
Page 77 and 78: # of components # of components # o
Page 79 and 80: ch 2 ch 3 ch 1 ch 2 ch 3 ch 4 3.4 D
Page 81: 3.5 Discussion 3.5 Discussion In th
Page 84 and 85: 4 Indian Buffet Process Models matr
Page 86 and 87: 4 Indian Buffet Process Models In t
Page 88 and 89: 4 Indian Buffet Process Models α z
Page 90 and 91: 4 Indian Buffet Process Models α
Page 92 and 93: 4 Indian Buffet Process Models The
Page 94 and 95: 4 Indian Buffet Process Models colu
Page 96 and 97: 4 Indian Buffet Process Models Pois
Page 98 and 99: 4 Indian Buffet Process Models z α
Page 100 and 101: 4 Indian Buffet Process Models For
Page 102 and 103: 4 Indian Buffet Process Models ciat
Page 104 and 105: 4 Indian Buffet Process Models rati
Page 106 and 107: 4 Indian Buffet Process Models repr
Page 108 and 109: 4 Indian Buffet Process Models samp
Page 110 and 111: 4 Indian Buffet Process Models feat
Page 112 and 113: 4 Indian Buffet Process Models Algo
Page 114 and 115: 4 Indian Buffet Process Models mixi
Page 116 and 117: 4 Indian Buffet Process Models Figu
Page 118 and 119: 4 Indian Buffet Process Models pres
Page 120 and 121: 4 Indian Buffet Process Models ε
Page 122 and 123: 4 Indian Buffet Process Models LL P
Page 126 and 127: 4 Indian Buffet Process Models tEBA
Page 128 and 129: 4 Indian Buffet Process Models dist
Page 130 and 131: 5 Conclusions has been defined as a
Page 132 and 133: A Details of Derivations for the St
Page 135 and 136: B Mathematical Appendix B.1 Dirichl
Page 137 and 138: p3 α 1 0.5 0 0 0.5 0.4 0.3 0.2 0.1
Page 139 and 140: Construction of A Process B.4 Equal
Page 141 and 142: Bibliography D. Aldous. Exchangeabi
Page 143 and 144: Bibliography T. S. Ferguson. Prior
Page 145 and 146: Bibliography L. F. James and J. W.
Page 147 and 148: Bibliography R. M. Neal. Probabilis
Page 149 and 150: Bibliography Y. W. Teh, M. I. Jorda
show all

Nonparametric Bayesian Discrete Latent Variable Models for ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?