11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

288 Nonparametric Bayesworks well for all these variables. This is one of the reasons that ensembleapproaches, which average across many models/algorithms, tend to producestate of the art performance in difficult prediction tasks. Combining many simplemodels, each able to express different characteristics of the data, is usefuland similar conceptually to the idea of Bayesian model averaging (BMA),though BMA is typically only implemented within a narrow parametric class(e.g., normal linear regression).In considering applications of NP Bayes in big data settings, several questionsarise. The first is “Why bother?” In particular, what do we have to gainover the rich plethora of machine learning algorithms already available, andwhich are being refined and innovated upon daily by thousands of researchers?There are clear and compelling answers to this question. ML algorithms almostalways rely on convex optimization to obtain a point estimate, and uncertaintyis seldom of much interest in the ML community, given the types of applicationsthey are faced with. In contrast, in most scientific applications, predictionis not the primary interest and one is usually focused on inferences thataccount for uncertainty. For example, the focus may be on assessing the conditionalindependence structure (graphical model) relating genetic variants,environmental exposures and cardiovascular disease outcomes (an applicationI’m currently working on). Obtaining a single estimate of the graph is clearlynot sufficient, and would be essentially uninterpretable. Indeed, such graphsproduced by ML methods such as graphical lasso have been deemed “ridiculograms.”They critically depend on a tuning parameter that is difficult tochoose objectively and produce a massive number of connections that cannotbe effectively examined visually. Using an NP Bayes approach, we could insteadmake highly useful statements (at least according to my collaborators),such as (i) the posterior probability that genetic variants in a particular geneare associated with cardiovascular disease risk, adjusting for other factors, isP %; or (ii) the posterior probability that air pollution exposure contributes torisk, adjusted for genetic variants and other factors, is Q%. We can also obtainposterior probabilities of an edge between each variable without parametricassumptions, such as Gaussianity. This is just one example of the utility ofprobabilistic NP Bayes models; I could list dozens of others.The question then is why aren’t more people using and working on thedevelopment of NP Bayes methods? The answer to the first part of this questionis clearly computational speed, simplicity and accessibility. As mentionedabove, there is somewhat of a learning curve involved in NP Bayes, which isnot covered in most graduate curriculums. In contrast, penalized optimizationmethods, such as the lasso, are both simple and very widely taught. In addition,convex optimization algorithms for very rapidly implementing penalizedoptimization, especially in big data settings, have been highly optimized andrefined in countless publications by leading researchers. This has led to simplemethods that are scalable to big data, and which can exploit distributedcomputing architectures to further scale up to enormous settings. Researchersworking on these types of methods often have a computer science or engineer-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!