Supplemental Materials for: Bayesian Semiparametric Inference for ...

**Supplemental** **Materials** **for**:**Bayesian** **Semiparametric** **Inference** **for**Multivariate Doubly-Interval-Censored DataALEJANDRO JARA, EMMANUEL LESAFFRE, MARIA DE IORIO ANDFERNANDO QUINTANAAppendix AMCMC scheme based on a marginal representationPreliminariesAn alternative to explore the posterior distribution is to consider the finite dimensional posteriorthat arises after integrating out the random measure G,p ( z, β, a, b, m, S,Σ | A O i , A E i) ∏ m{ ( ) ( )∝ p AOi , A E i | T O i , T E i p TOi , T E i | z i p (zi | β i ,Σ) } ×i=1p (β | a, b, m, S) p (a) p (b | a) ×p (m)p(S) p (Σ) ,where p ( T O i , T E i | z i)is a deterministic mapping and p (β | a, b, m, S) arises by exploiting theprediction ruleP(β 1 ∈ B) = G 0 (B),(A.1)and, **for** m = 1, 2, . . .,P(β m+1 ∈ B | β 1 , . . .,β m ) =n ∗ (m)∑j=1ρ j (ϕ(m))δ β ∗j(B) + ρ n ∗ (m)+1(ϕ(m))G 0 (B), (A.2)where n ∗ ≡ n ∗ (m) corresponds to the total number of distinct values among β 1 , . . .,β m , denotedby β ∗ 1 , . . .,β∗ n ∗, the collection of functions ρ j are weights that add up to 1, and ϕ(m) =(|S 1 (m)|, . . .,|S n ∗(m)|), with S j (m), j = 1, . . .,n ∗ , being the set of i ∈ {1, . . ., m} such that1

3β ∗ i ) denote the size of the ith cluster, i = 1, . . .,n∗ , where I(A) = 1 if A occurs and 0 otherwise.Representing the vector β = (β 1 , . . .,β m ) in terms of the equivalent representation (π, β ∗ ), theprior distribution p (β | a, b, m, S) can be decomposed into a prior **for** the partition π, inducedby the cluster configurations, p(π | a, b), and a prior **for** the cluster locations given the partitionp(β ∗ | π, m, S), such thatp ( z, β ∗ , π, a, b, m, S,Σ | A O i , AE i{) ∏ m∝ p ( A O i , AE i | T O i , T ) } Ei ×j=1{∏ mp ( T O i , T ) } Ei | z i ×j=1⎧ ⎧⎫⎫⎨n ∗ (m)∏ ⎨∏p ( z⎩ ⎩ i | β ∗ j,Σ ) ⎬ ⎬⎭⎭ ×j=1 i∈S j{p(β ∗ | π, m, S)p(π | a, b)} ×{p (b | a)p(a) p (m)p(S)} ×p (Σ) ,(A.8)where p(β ∗ | π, m, S) ≡ ∏ n ∗ (m)j=1N n(p+q)(β∗j | m, S ) , and the prior distribution **for** the partitionis given by,p (π | a, b) = Γ(b + 1) ∏ n ∗ (m)−1 nj=1(b + ja)∗ (m)∏ Γ(|S j | − a)Γ(b + m)Γ(1 − a) . (A.9)j=1More efficient Markov chain, in the sense of a faster mixing, can be obtained by considering theupdate of the partition π with respect to a reduced model resulting of analytically marginalizingwith respect to the β ∗ parameters,p ( z, π, a, b, m, S,Σ | A O i , A E i{) ∏ m∝ p ( ) }A O i , A E i | T O i , T E i ×i=1{∏ mp ( ) }T O i , T E i | z i ×i=1⎧⎫⎨n ∗ (m)∏p ( z Sj | m, S,Σ ) ⎬p(π | a, b)⎩⎭ ×j=1p (b | a)p(a) p (m)p(S) p (Σ) ,(A.10)

5withξ L ij (uL ij , uU ij , vL ij , vU ij , z i,j+n) = log ( max { u L ij , uL i,j+n − exp[z i,j+n] }) ,(A.14)ξ U ij (uL ij , uU ij , vL ij , vU ij , z i,j+n) = log ( min { u U ij , uU i,j+n − exp[z i,j+n] }) ,(A.15)̺Uij(u L ij, u U ij, v L ij, v U ij, z i,j−n ) = log ( v U ij − exp{z i,j−n } ) ,(A.16)and̺Lij (uL ij , uU ij , vL ij , vU ij , z i,j−n) ={−∞if v L ij = uL ijlog ( v L ij − exp{z i,j−n } ) if v L ij ≠ u L ij(A.17)Note that the first term in expression (A.17) arises when the true chronological onset and eventtimes occurs in the same interval. Samples from the conditional distribution in (A.13) are obtainedby considering a Gibbs scheme through the components of z i , conditional on the rest,which have truncated univariate normal distributions in the corresponding limits.Updating the partitionUnder the collapsed sate the full conditional distribution **for** the partition is given by,⎧⎫⎨n ∗ (m)∏p (π | · · · ) ∝ p ( z⎩Sj | m,S,Σ ) ⎬p(π | a,b)⎭j=1⎡ ⎧m∏= ⎣1⎨n ∗ (i−1)∑() ⎫ ⎬(|Si − 1 + b ⎩ j (i − 1)| − a) p z i | m,S,Σ,z Sj (i−1)⎭ +i=1j=1]1i − 1 + b (b + n∗ (i − 1)a) p (Z i | m,S,Σ) , (A.18)wherep (z i | m, S,Σ) =∫p ( z i | β ∗ j ,Σ) p ( β ∗ j | m, S) dβ ∗ j≡N 2n(zi | X i m,Σ + X i SX T i),and()p z i | m,S,Σ,z Sj (i−1)=∫p ( z i | β ∗ j,Σ ) ×p ( β ∗ j | {z l : l ∈ S j (i − 1),l < i},m,S,Σ ) dβ ∗ j≡ N 2n (z i | X i θ ∗ ,∆ ∗ ),

7Updating the PD parameters a and bThe full conditional distribution **for** the PD parameters a and b is proportional to the product ofexpressions (15) and (16) in the paper and expression (A.9):p(a, b | · · ·) ∝ p(π | a, b)p(b | a, µ b , σ b )p(a | λ, α 0 , α 1 ).Due to the non-standard mixture prior specification **for** a, the full conditional distribution is absolutelycontinuous with respect to the product measure ν a ×ν b , where ν a = δ 0 +ψ, with δ 0 being theDirac measure at 0 and ψ the Lebesgue measure restricted to (0, 1) and ν b is the Lebesgue measurerestricted to (−a, +∞). To update the PD parameters we consider a Metropolis-Hastings(MH) step. In order to use MH, we need to define an irreducible Markov transition kernel(proposal), that is absolutely continuous with respect to ν a × ν b . We achieve this by consideringa generalization of the random walk Metropolis kernel with proposal density given by(a ′ , b ′ ) ∼ q(a ′ )q(b ′ | a ′ ), whereq(a ′ ) = τ 1 δ 0 + (1 − τ 1 )N(a, τ 2 ),andq(b ′ | a ′ ) = N(b, τ 2 )I(−a ′ , +∞),with τ 1 ∈ (0, 1) and τ 2 ∈ R + being fixed numbers that can be tuned to obtain good mixing. Wefound that τ 1 = 0.1 and τ 2 = 1 provide good starting points yielding acceptance rates around25% in our applications of the model.

9Updating the log-survival timesThe latent data vector z is updated using the same approach previously described in Appendix A.Updating the stick-breaking variablesThe partial sum approximation of the RPM and the steak-breaking representation of the weightsallow a simple updating scheme given by,(V i | a, b, s ind∼ Beta|S i | + 1 − a, b + ia +N∑j=i+1|S i |), (B.4)where, i = 1, . . ., N − 1, S i = {j : s j = i, j = 1, . . .,m} is the set of observations in cluster iand |S i | = ∑ mj=1 I(s j = i) denotes the cluster size.Updating the cluster configurationsThe full conditional distribution **for** the cluster configurations takes the **for**m of a discrete distribution,s i | v, z i , β ∗ ,Σ ind∼ Discrete {w ∗ 1 , . . .,w∗ N },(B.5)where w ∗ j ∝ V j∏ j−1l=1 (1 − V l)N 2n(zi | X i β ∗ j ,Σ) .Updating the cluster locationsGiven the current state of the cluster configurations s the updating of the cluster locations isper**for**med by sampling from the full conditional p (β ∗ | z, s, m, S,Σ). This is per**for**med byconsidering draws from,p ( β ∗ j | z Sj , m, S,Σ ) , j = 1, . . .,N(B.6)which takes the **for**m of expression (A.5) replacing X i and z i by X Sj and z Sj , respectively, if|S j | ≠ 0 or of a N n(p+q) (m, S) distribution if |S j | = 0.Updating the baseline parameters and the kernel-dispersion matrixDue to the conjugacy of the model, the full conditional distributions **for** the baseline parametersand the covariance matrix of the normal kernel take the **for**m of expressions (A20), (A21) and

((A22) but where µ ∗ = Υ ∗ Υ −1 µ + ∑ )Ni=1 S−1 β ∗ i , Υ ∗ = ( NS −1 + Υ −1) −1 , γ ∗ = γ + N,ν ∗ = ν + m,Γ ∗ =(Γ −1 +) −1N∑(β ∗ i − m) (β∗ i − m)T ,i=110andΩ ∗ =(Ω −1 +) −1N∑(Z Si − X Si β ∗ i) (Z Si − X Si β ∗ i) T .i=1Updating the PD parameters a and bThe full conditional distribution **for** the PD parameters a and b is proportional to the product ofexpressions (15) and (16) in the paper withp(v | a, b) =N−1∏i=1Γ (b + (i − 1)a + 1)Γ(1 − a)Γ(b − ia) V −ai (1 − V i ) b+ia−1 .The same generalization of the random walk Metropolis kernel described in Appendix A can beused to update a and b, with the corresponding change in the posterior evaluation of current andcandidate values **for** these parameters.

Appendix CThe HIV-AIDS data11We per**for**m the analysis of a dataset considered by De Gruttola & Lagakos (1989). They analyzedin**for**mation from a cohort of hemophiliacs at risk of human immunodeficiency virus (HIV)infection from infusions of blood they received periodically to treat their hemophilia in two hospitalsin France. The data consider in**for**mation from 262 patients with Type A or B hemophiliatreated since 1978. All infected patients are believed to have become infected by contaminatedblood factor: 105 patients received at least 1,000 µg/kg of blood factor **for** at least one year between1982 and 1985 (heavily treated group), and 157 patients received less than 1,000 µg/kg ineach year (lighter treated group).For this cohort both infection with HIV and the onset of acquired immunodeficiency syndrome(AIDS) or other clinical symptoms could be subject to censoring. There**for**e, the inductiontime between infection and clinical AIDS are treated as doubly-censored. The periodicobservation of their HIV infection status was possible because blood samples were stored andretrospectively tested **for** evidence of infection with the HIV. Note that both the distribution ofchronological time of infection and induction time are of interest. In De Gruttola & Lagakos(1989) the proposed nonparametric maximum likelihood one-sample estimator was illustrated byconsidering the statistics (u L i , uU i , vL i , vU i ), i = 1, . . ., m, which were the results of a discretizationof the time axis into 6-month intervals. Although an important advantage of our approach isthat no discretization is required, we still do so **for** the sake of comparison with previous analyses(De Gruttola & Lagakos, 1989).Different models were considered. We chose three values of the parameter λ to reflect differentdegrees of departures from LDDP model. Specifically, we chose λ = 0.3, λ = 0.5 andλ = 0.7. For all the models we used α 0 = α 1 = 1, µ b = 10, σ b = 200, ν = 4, Ω = I 2 ,γ = 5, Γ = I 4 , η = 0 4 , and Υ = 100I 4 . For each model 4.02 million of samples of a Markovchain cycle were completed. Because of storage limitations and dependence, the full chain wassub-sampled every 200 steps after a burn in period of 20,000 samples, to give a reduced chain oflength 20,000.The resulting estimates of the cumulative distribution of times of HIV sero-conversion and inductiontime between HIV sero-conversion and onset of symptoms are displayed in Figure 1. Thefigure shows a sharp rise in the frequency of infections at time 6, with infections **for** the heavilytreated group tending to occur somewhat sooner, which largely agrees with the results obtainedby De Gruttola & Lagakos (1989). Figure 1 also shows that the estimated induction distributionrises more quickly in the first 6 time-periods following infection **for** the heavily treated group.However, due to the small number of patients who developed clinical conditions, the HPD intervalswere rather large. Once more, the results largely agree with those obtained by De Gruttola

12Cumulative0.0 0.4 0.8Cumulative0.0 0.4 0.85 10 15 20Time (in 6−months periods)(a)0 5 10 15 20Time (in 6−months periods)(b)Cumulative0.0 0.4 0.8Cumulative0.0 0.4 0.85 10 15 20Time (in 6−months periods)(c)0 5 10 15 20Time (in 6−months periods)(d)Cumulative0.0 0.4 0.8Cumulative0.0 0.4 0.85 10 15 20Time (in 6−months periods)(e)0 5 10 15 20Time (in 6−months periods)(f)Figure 1: HIV-AIDS data: Estimated cumulative distribution of time of HIV sero-conversion(panels a, c, and e) and induction times between HIV sero-conversion and onset of symptoms(panels b, d, and f) **for** λ = 0.3 (panels a and b), λ = 0.5 (panels c and d) and λ = 0.7 (panelse and f), respectively. Red and blue lines correspond to the heavily and lightly treated group,respectively. The posterior means (soling lines) are presented along the point-wise 95%HPDintervals.

13& Lagakos (1989). Our estimates represent a smooth version of the ones previously reported bythose authors. No differences were observed when considering different values **for** λ suggestingthat resulting survival curves are robust to the prior specification.The resulting estimates of the hazard function of times of HIV sero-conversion and inductiontime between HIV sero-conversion and onset of symptoms are displayed in Figure 2. The figureshows a sharp rise in the hazard of infection at time 6, with a significantly bigger increasein the heavily-treated than in the lightly-treated group. Figure 2 also shows that the estimatedhazard functions **for** induction in both treatment groups have a similar behavior after the 7 yearsfollowing the infection and that the hazard **for** **for** induction rises more quickly in the first 6 timeperiodsfollowing infection **for** the heavily treated than **for** the lightly treated group. However,due to the small number of patients no significant differences were found. Again, no differenceswere observed when considering different values **for** λ suggesting that results are robust to theprior specification.The posterior mean of the median infection time **for** the heavily-treated group was 11.6 (5.8years) while **for** the lightly-treated group was 14.4 (7.2 years). The posterior mean of the medianlatency time **for** the heavily-trated group was 13.8 (6.9 years) while **for** the lightly-trated groupwas 15.7 (7.9 years).The posterior probabilities **for** the LDDP model were 0.56, 0.66, and 0.85 **for** λ = 0.3,λ = 0.5, and λ = 0.7, respectively. The posterior probabilities were 87%, 32%, and 21%bigger than the corresponding prior beliefs in the LDDP model and suggest that the LDDP fitsthe observed data in a better way. Note that even though the sample size was only 262, theupdated prior probabilities **for** LDDP suggest that real in**for**mation is contained in the data **for** thea parameter.

14Hazard0.0 0.2 0.4 0.6Hazard0.00 0.105 10 15 20Time (in 6−months periods)(a)0 5 10 15 20Time (in 6−months periods)(b)Hazard0.0 0.2 0.4 0.6Hazard0.00 0.105 10 15 20Time (in 6−months periods)(c)0 5 10 15 20Time (in 6−months periods)(d)Hazard0.0 0.2 0.4 0.6Hazard0.00 0.105 10 15 20Time (in 6−months periods)(e)0 5 10 15 20Time (in 6−months periods)(f)Figure 2: HIV-AIDS data: Estimated hazard function **for** time of HIV sero-conversion (panelsa, c, and e) and induction times between HIV sero-conversion and onset of symptoms (panels b,d, and f) **for** λ = 0.3 (panels a and b), λ = 0.5 (panels c and d) and λ = 0.7 (panels e and f),respectively. Red and blue lines correspond to the heavily and lightly treated group, respectively.The posterior means (soling lines) are presented along the point-wise 95%HPD intervals.

15ReferencesDAHL, D. B. (2005). Sequentially-Allocated Merge-Split Sampler **for** Conjugate and NonconjugateDirichlet Process Mixture Models. Tech. rep., Techical Report, Texas AM University,USA.DE GRUTTOLA, V. & LAGAKOS, S. W. (1989). Analysis of doubly-censored survival data, withapplication to AIDS. Biometrics 45 1–11.ISHWARAN, H. & JAMES, L. F. (2001). Gibbs sampling methods **for** stick-breaking priors.Journal of the American Statistical Association 96 161–173.JAIN, S. & NEAL, R. M. (2004). A split-merge Markov Chain Monte Carlo procedure **for**the Dirichlet Process mixture model. Journal of Computational and Graphical Statistics 13158–182.MACEACHERN, S. N. (1994). Estimating normal means with a conjugate style Dirichlet processprior. Communications in Statistics: Simulation and Computation 23 727–741.MULIERE, P. & TARDELLA, L. (1998). Approximating distributions of random functionals ofFerguson-Dirichlet priors. The Canadian Journal of Statistics 26 283–297.NAVARRETE, C., QUINTANA, F. A. & MÜLLER, P. (2008). Some issues on nonparametric**Bayesian** modeling using species sampling models. Statistical Modelling 8 3–21.PAPASPILIOPOULOS, O. & ROBERTS, G. O. (2008). Retrospective Markov chain Monte Carlomethods **for** Dirichlet process hierarchical models. Biometrika 95 169–186.PITMAN, J. (1996). Some developments of the Blackwell-MacQueen urn scheme. In T. S. Ferguson,L. S. Shapeley & J. B. MacQueen, eds., Statistics, Probability and Game Theory. Papersin Honor of David Blackwell. IMS Lecture Notes - Monograph Series, Hayward, Cali**for**nia,245–268.