Patrick Harrington

Patrick Harrington, Ph.D.
10

Ph.D. Program
Sr. Dir. of Machine Learning Engineering
Workday, Inc.

Chair

Dissertation Title

Inverse Problems in High Dimensional Stochastic Systems Under Uncertainty

Research Interest

Increasingly often, problems in modern medicine, quantitative finance, or social-networking involve tens of thousands of variables that interact with each other and jointly evolve over time. The states of these variables may correspond to the phenotype of a particular individual, the price of a security, or the current status of an individual's social networking profile. If these states are hidden to a researcher, additional information must be obtained to infer these hidden states based upon measurements of other variables, knowledge of the interacting network structure, and any dynamics that model the evolution of these states. This dissertation is an attempt to address general problems regarding reasoning under uncertainty in such spatio-temporal models but with an emphasis to applications in predictive health and disease in a loosely monitored population ofindividuals. The motivation is highly interdisciplinary and draws on tools and concepts from machine learning, statistics, epidemiology, bioinformatics, and physics. We begin by presenting a solution to recursively sampling the best subset of nodes/variables that elicit the largest expected information gain of all sampled and un-sampled nodes in a large spatio-temporal complex network. We then present a tractable method for empirically estimating the spatio-temporal graphical model structure corresponding to the "susceptible", "infected", and "recovered" (SIR) model of mathematical epidemiology. Here, we formulate the problem as an L1-penalized likelihood convex program and produce network detection performance superior to other comparable state of the art methods. We present a logistic regression classifier that is robust to worst-case bounded measurement uncertainty. The proposed method produces superior worst-casedetection performance to the standard L1-logistic regression classifier on a Human rhinovirus (HRV) gene expression data set. The final chapter concludes with identifying the appropriate basis functions used in a classification model when the data is both high-dimensional and temporally sampled with ultimate goal of discriminating between multiple states/labels, e.g., phenotypes. We utilize Gaussian Processes and L1-logistic regression to accomplish this task and apply it to a human gene expression time-series data set resulting from a challenge study inoculation with Human Influenza A/H3N2, HRV, and Human respiratory syncytial virus (RSV).

Current Placement

Workday, Inc.