| Abstract: |
Active learning techniques have previously been shown to be extremely
effective for learning a target function over an entire parameter
space based on a limited set of observations. However, in many cases,
only a specific property of the target function needs to be learned.
For instance, when discovering the boundary of a region --- such as
the locations in which the wireless network strength is above some
operable level, --- we are interested in learning only the level-set
of the target function. While techniques that learn the entire target
function over the parameter space can be used to detection specific
properties of the target function (e.g. level-sets), methods that
learn only the required properties can be significantly more
efficient, especially as the dimensionality of the parameter space
increases.
These active learning algorithms have a natural application in many
statistical inference techniques. For example, given a set of data
and a physical model of the data, which is a function of several
variables, a scientist is often interested in determining the ranges
of the variables which are statistically supported by the data. We
show that many frequentist statistical inference techniques can be
reduced to a level-set detection problem or similar search of a
property of the target function , and hence benefit from active
learning algorithms which target specific properties. Using these
active learning algorithms significantly decreases the number of
experiments required to accurately detect the boundaries of the
desired 1-alpha confidence regions. Moreover, since computing the
model of the data given the input parameters may be expensive (either
computationally, or monetarily), such algorithms can facilitate
analyses that were previously infeasible.
We demonstrate the use of several statistical inference techniques
combined with active learning algorithms on several cosmological data
sets. The data sets vary in the dimensionality of the input parameters
from two to eight. We show that naive algorithms, such as random
sampling or grid based methods, are computationally infeasible for the
higher dimensional data sets. However, our active learning techniques
can efficiently detect the desired 1 - alpha confidence regions. Moreover,
the use of frequentist inference techniques allows us to easily
perform additional inquiries, such as hypothetical restrictions on the
parameters and joint analyses of all the cosmological data sets, with
only a small number of additional experiments.
Thesis Committee:
Jeff Schneider (Chair),
Christopher Genovese,
Christopher J Miller (NOAO/CTIO),
Andrew Moore (Google),
Robert C. Nichol (Univ. Portsmouth),
Larry Wasserman.
More information can be found at:
http://gs3636.sp.cs.cmu.edu/thesis/main.pdf
|