Fall 2008 Seminar

 
 
 
       
  Machine Learning Seminar Series
 
 
  Seminar Schedule (Seminar Organizer: Prof. Ziv Bar-Joseph)
 

 

ML/Google Seminars

Machine Learning Lunch Seminar

 

 

Date: November 26, 2008
Time: 09:00 AM - 10:30 AM (Refreshments)
Location: 1507 Newell-Simon Hall
Speaker: Jingrui He PhD Candidate
Title: Rare Category Detection
Abstract: Rare category detection refers to the problem of identifying the examples from the minority classes with the least label requests given an unlabeled, unbalanced data set. It is an open challenge in machine learning, and has a wealth of applications, such as financial fraud detection, network intrusion detection, astronomy, spam image detection, etc. In this thesis, we plan to address this problem from four perspectives: (1) initial class label discovery for various data types, (2) dealing with prior information about the data set, (3) feature selection for rare category detection, and (4) rare category classification. Our recent work focuses on the first two perspectives, i.e. rare category detection for data with feature representation and graph data when different amount of prior information is available. For data with feature representation, given enough prior information about the data set, we proposed the nearest-neighbor-based methods, which essentially perform local density differential sampling. They are proven to be effective both theoretically and experimentally. On the other hand, when no prior information about the data set is available, we proposed the density-based-method, which makes use of the specially designed exponential families. For graph data, we designed two algorithms which take advantage of the global similarity between two examples. Given the same amount of information, the first algorithm performs better than state-of-the-art techniques; whereas given much less information, the second algorithm is comparable with state-of-the-art techniques. Future work includes three directions. First, for data n high dimensional feature space, we will select the features that are most relevant to the minority classes. Second, following rare category detection, we will design effective methods for rare category classification, which takes into account the fact that the minority classes form compact clusters in the feature space. Third, we will adapt existing rare category detection methods to work for stream data. The goal is to identify the emerging trends as soon as possible. Thesis Committee: Jaime Carbonell (Chair), John Lafferty, Larry Wasserman, Foster Provost, NYU. On line document: www.cs.cmu.edu/~jingruih/thesis/jingrui_proposal.pdf