MEDG Spring 2013 Reading Group

Info:

We meet Tuesdays at 1:00 pm in the CSAIL 32-250 area.

Upcoming Papers:

Unsupervised Pattern Discovery in Electronic Health Care Data Using Probabilistic Clustering Models, B. Marlin et al. (Download the PDF)

Bedside clinicians routinely identify temporal patterns in physiologic data in the process of choosing and administering treatments intended to alter the course of critical illness for individual patients. Our primary interest is the study of unsupervised learning techniques for automatically uncovering such patterns from the physiologic time series data contained in electronic health care records. This data is sparse, high-dimensional and often both uncertain and incomplete. In this paper, we develop and study a probabilistic clustering model designed to mitigate the eects of temporal sparsity inherent in electronic health care records data. We evaluate the model qualitatively by visualizing the learned cluster parameters and quantitatively in terms of its ability to predict mortality outcomes associated with patient episodes. Our results indicate that the model can discover distinct, recognizable physiologic patterns with prognostic signicance.
Discriminative Learning of Sum-Product Networks, R. Gens, P. Domingos (Download the PDF)
Sum-product networks are a new deep architecture that can perform fast, exact inference on high-treewidth models. Only generative methods for training SPNs have been proposed to date. In this paper, we present the first discriminative training algorithms for SPNs, combining the high accuracy of the former with the representational power and tractability of the latter. We show that the class of tractable discriminative SPNs is broader than the class of tractable generative ones, and propose an efficient backpropagation-style algorithm for computing the gradient of the conditional log likelihood. Standard gradient descent suffers from the diffusion problem, but networks with many layers can be learned reliably using ''hard'' gradient descent, where marginal inference is replaced by MPE inference (i.e., inferring the most probable state of the non-evidence variables). The resulting updates have a simple and intuitive form. We test discriminative SPNs on standard image classification tasks. We obtain the best results to date on the CIFAR-10 dataset, using fewer features than prior methods with an SPN architecture that learns local image structure discriminatively. We also report the highest published test accuracy on STL-10 even though we only use the labeled portion of the dataset.
No voodoo here! Learning discrete graphical models via inverse covariance estimation, P. Loh, M. Wainwright (Download the PDF)
We investigate the relationship between the support of the inverses of generalized covariance matrices and the structure of a discrete graphical model. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results which were previously established only for multivariate Gaussian distributions, and partially answers an open question about the meaning of the inverse covariance matrix of a non-Gaussian distribution. We propose graph selection methods for a general discrete graphical model with bounded degree based on possibly corrupted observations, and verify our theoretical results via simulations. Along the way, we also establish new results for support recovery in the setting of sparse high-dimensional linear regression based on corrupted and missing observations.
Ordered Rules for Classification: A Discrete Optimization Approach to Associated Classification, A. Chang, D. Bertsimas, C. Rudin, (Download the PDF)
We aim to design classifiers that have the interpretability of association rules yet have predictive power on par with the top machine learning algorithms for classification. We propose a novel mixed integer optimization (MIO) approach called Ordered Rules for Classification (ORC) for this task. Our method has two parts. The first part mines a particular frontier of solutions in the space of rules, and we show that this frontier contains the best rules according to a variety of interestingness measures. The second part learns an optimal ranking for the rules to build a decision list classifier that is simple and insightful. We report empirical evidence using several different datasets to demonstrate the performance of this method.

Completed Papers:

TCA: High Dimensional PCA for Non-Gaussian Data, F. Han, H. Liu, (Download the PDF)
We propose a high dimensional semiparametric scale-invariant principle component analysis, named TCA, by utilize the natural connection between the elliptical distribution family and the principal component analysis. Elliptical distribution family includes many well-known multivariate distributions like multivariate t and logistic and it is extended to the meta-elliptical by Fang (2002) using the copula techniques. In this paper we extend the meta-elliptical distribution family to a even larger family, called transelliptical. We prove that TCA can obtain a near-optimal s(log d/n)^{1/2} estimation consistency rate in the transelliptical distribution family, even if the distributions are very heavy-tailed, have infinite second moments, do not have densities and possess arbitrarily continuous marginal distributions. A feature selection result with explicit rate is also provided. TCA is also implemented in both numerical simulations and large-scale stock data to illustrate its empirical performance. Both theories and experiments confirm that TCA can achieve model flexibility, estimation accuracy and robustness at almost no cost.

Bayesian Nonparametric Methods for Markov Switching Processes, E.B. Fox, E.B. Sudderth, M.I. Jordan, A.S. Willsky (Download the PDF)
Markov switching processes, such as the hidden Markov model (HMM) and switching linear dynamical system (SLDS), are often used to describe rich dynamical phenomena. They describe complex behavior via repeated returns to a set of simpler models: imagine a person alternating between walking, running, and jumping behaviors, or a stock index switching between regimes of high and low volatility. Classical approaches to identification and estimation of these models assume a fixed, pre-specified number of dynamical models. We instead examine Bayesian nonparametric approaches that define a prior on Markov switching processes with an unbounded number of potential model parameters (i.e., Markov modes). By leveraging stochastic processes such as the beta and Dirichlet process, these methods allow the data to drive the complexity of the learned model, while still permitting efficient inference algorithms. They also lead to generalizations which discover and model dynamical behaviors shared among multiple related time series.

Marzyeh Ghassemi

Academic Homepage of Marzyeh Ghassemi @ MIT CSAIL

MEDG Spring 2013 Reading Group