Fall 2007 Seminar

 
 
       
  Machine Learning Seminar Series
 
 
  Seminar Schedule (Seminar Organizer: Prof. Ziv Bar-Joseph)
 

 

ML/Google Seminars

Machine Learning Lunchtime Chats

 

 

Date: December 12, 2007
Time: 09:00 AM - 10:30 AM
Location: 4625 Wean Hall
Speaker: Brent Bryan Ph.D. Candidate
Title: Actively Learning Specific Function Properties with Applications to Statistical Inference
Abstract: Active learning techniques have previously been shown to be extremely effective for learning a target function over an entire parameter space based on a limited set of observations. However, in many cases, only a specific property of the target function needs to be learned. For instance, when discovering the boundary of a region --- such as the locations in which the wireless network strength is above some operable level, --- we are interested in learning only the level-set of the target function. While techniques that learn the entire target function over the parameter space can be used to detection specific properties of the target function (e.g. level-sets), methods that learn only the required properties can be significantly more efficient, especially as the dimensionality of the parameter space increases. These active learning algorithms have a natural application in many statistical inference techniques. For example, given a set of data and a physical model of the data, which is a function of several variables, a scientist is often interested in determining the ranges of the variables which are statistically supported by the data. We show that many frequentist statistical inference techniques can be reduced to a level-set detection problem or similar search of a property of the target function , and hence benefit from active learning algorithms which target specific properties. Using these active learning algorithms significantly decreases the number of experiments required to accurately detect the boundaries of the desired 1-alpha confidence regions. Moreover, since computing the model of the data given the input parameters may be expensive (either computationally, or monetarily), such algorithms can facilitate analyses that were previously infeasible. We demonstrate the use of several statistical inference techniques combined with active learning algorithms on several cosmological data sets. The data sets vary in the dimensionality of the input parameters from two to eight. We show that naive algorithms, such as random sampling or grid based methods, are computationally infeasible for the higher dimensional data sets. However, our active learning techniques can efficiently detect the desired 1 - alpha confidence regions. Moreover, the use of frequentist inference techniques allows us to easily perform additional inquiries, such as hypothetical restrictions on the parameters and joint analyses of all the cosmological data sets, with only a small number of additional experiments. Thesis Committee: Jeff Schneider (Chair), Christopher Genovese, Christopher J Miller (NOAO/CTIO), Andrew Moore (Google), Robert C. Nichol (Univ. Portsmouth), Larry Wasserman. More information can be found at: http://gs3636.sp.cs.cmu.edu/thesis/main.pdf