Spring 2007 Seminar

 
 
       
  Machine Learning Seminar Series
 
 
  Seminar Schedule (Seminar Organizer: Prof. Ziv Bar-Joseph)
 

 

ML/Google Seminars

Machine Learning Lunchtime Chats

 

 

Date: January 29, 2007
Time: 11:00 AM - 12:00 PM (Refreshments)
Location: 1507 Newell-Simon Hall
Speaker: Brent Bryan PhD Candidate
Title: Active Learning Search Strategies for Computing Level Sets:
Abstract: In many scientific applications, one less interested in determining the point which maximizes a function, but rather desires to know the set of all points which are above some specified level set. For example, consider the task of computing the set of parameters for a given parameterized model which fit some observed data reasonably well. If we have an oracle which can tell us fit ``goodness'' as a function of input parameters, this problem becomes that of determining the level set which delineates those models that fit well from those that do not. In astrophysics, we can use this idea to study the early universe as well as galactic history. By computing confidence intervals for parameterized models in the first case, astronomers can determine the age, composition and eventual fate of the universe, while for the second query astronomers hope to decipher how galaxies form and evolve over time. While, several techniques have been developed to compute the level set of a function, most are either inefficient, or lack convergence guarantees for finite sample sizes (or both). For instance, one common way to compute confidence intervals, is to use MCMC. However, MCMC is known to possibly converge to incorrect solutions with limited chain sizes. Additionally, MCMC is a procedure for sampling from a posterior distribution, and as such, it draws a large number of experiments from regions that are well away from the boundary of the confidence region, reducing its efficiency. Instead, we propose a frequentist based technique that combines statistical hypothesis tests with an active learning framework to specifically explore the confidence region boundary, while simultaneously sampling the remaining parameter space sufficiently to prove the convergence of the algorithm with a finite number of samples. In this thesis, we will develop an active learning framework to efficiently compute function level sets. We will demonstrate the utility of the derived framework by developing a frequentist-based statistical inference technique to efficiently compute confidence intervals for model parameters with respect to some observed data. We then extend this framework to handle the simultaneous computation of confidence intervals for millions of galaxies. Additionally, we will develop algorithms to prove that the resulting intervals are correct within some predefined tolerance. Thesis Committee: Jeff Schneider (Chair) Chris Genovese Andrew Moore (Google) Chris Miller (Cerro Tololo Inter-American Observatory, Chile) Bob Nichol (Institute of Cosmology and Gravitation, University of Portsmouth) Larry Wasserman