| Abstract: |
In many scientific applications, one less interested in determining the point which maximizes a function, but rather desires to know the set of all points which are above some specified level set. For example, consider the task of computing the set of parameters for a given parameterized model which fit some observed data reasonably well. If we have an oracle which can tell us fit ``goodness'' as a function of input parameters, this problem becomes that of determining the level set which delineates those models that fit well from those
that do not. In astrophysics, we can use this idea to study the early universe as well as galactic history. By computing confidence intervals for parameterized models in the first case, astronomers can determine the age, composition and eventual fate of the universe, while for the second query astronomers hope to decipher how galaxies form and evolve over time.
While, several techniques have been developed to compute the level set
of a function, most are either inefficient, or lack convergence guarantees for finite sample sizes (or both). For instance, one common way to compute confidence intervals, is to use MCMC. However, MCMC is known to possibly converge to incorrect solutions with limited chain sizes. Additionally, MCMC is a procedure for sampling from a posterior distribution, and as such, it draws a large number of experiments from regions that are well away from the boundary of the confidence region, reducing its efficiency. Instead, we propose a frequentist based technique that combines statistical hypothesis tests with an active learning framework to specifically explore the confidence region boundary, while simultaneously sampling the remaining parameter space sufficiently to prove the convergence of the algorithm with a finite number of samples.
In this thesis, we will develop an active learning framework to efficiently compute function level sets. We will demonstrate the
utility of the derived framework by developing a frequentist-based statistical inference technique to efficiently compute confidence
intervals for model parameters with respect to some observed data. We then extend this framework to handle the simultaneous computation of confidence intervals for millions of galaxies. Additionally, we will develop algorithms to prove that the resulting intervals are correct within some predefined tolerance.
Thesis Committee:
Jeff Schneider (Chair)
Chris Genovese
Andrew Moore (Google)
Chris Miller (Cerro Tololo Inter-American Observatory, Chile)
Bob Nichol (Institute of Cosmology and Gravitation, University of Portsmouth)
Larry Wasserman |