| Abstract: |
Over the past few years, new nonlinear dimension reduction techniques have emerged from the machine learning community. These new tools are based upon the spectral properties of certain diffusion operators and they aim at finding low-dimensional parametrizations of data sets. This type of approach has proven successful in achieving high performance classification and regression of high-dimensional data.
In this talk, I present a powerful framework for dimension reduction based on the spectral properties of certain Markov chains on the data. In particular, I describe how to construct coordinates parametrizing any data set that can be put in the form of a graph. These coordinates are used to learn the intrinsic geometry of the data, and therefore they allow dimensionality reduction. I also introduce an explicit metric on the data, the diffusion distance, that proves to be extremely useful for classification and regression purposes. This metric allows the design inference algorithms based on the preponderance of evidences and generalizes the concept of PageRank. The diffusion techniques also form a natural framework for data fusion and multisensor integration. All these ideas are illustrated with various examples on document classification, lexicon organization and automatic concept extraction, graph matching, audio and visual pattern recognition.
This is joint work with R.R. Coifman (Yale), Y. Keller (Yale), I.G. Kevrekidis (Princeton), A.B. Lee (CMU), M. Maggioni (Yale), B. Nadler (Weizmann institute).
|