Fall 2005 Seminar

 
 
       
  Machine Learning Seminar Series
 
 
  Seminar Schedule (Seminar Organizer: Prof. Ziv Bar-Joseph)
 

 

ML/Google Seminars

Machine Learning Lunchtime Chats

 

 

Date: November 14, 2005
Time: 4:30 PM - 5:30 PM
Location: A53 Baker Hall
Speaker: Ann Lee
Title: Geometric tools for high-dimensional data analysis
Abstract: In high-dimensional data analysis, one is often faced with the problem that real data is noisy and in many cases given in coordinates that are not informative for understanding the data structure itself or for performing later tasks, such as clustering, classification and regression. The combination of noise and very high dimensions (such as >1000) presents challenges for data analysis and calls for efficient dimensionality reduction tools that take the inherent geometry of natural data into account. In this talk, I will first describe a data-driven multi-scale basis that can be used for feature extraction of smooth data as well as data where the coordinates may be randomly ordered. I will then, in the second half of my talk, describe a general framework for dimensionality reduction, data set parameterization and clustering that combines many ideas from eigenmaps, spectral graph theory and harmonic analysis. Our construction is based on a Markov random walk on the data, and allows one to define a system of coordinates that is robust to noise, and that reflects the intrinsic geometry or connectivity of the data points in a diffusion process. Examples will be taken from image analysis, word-document clustering and spectroscopy. (Part of this work is joint with R.R. Coifman and S. Lafon)