Spring 2006 Seminar

 
 
       
  Machine Learning Seminar Series
 
 
  Seminar Schedule (Seminar Organizer: Prof. Ziv Bar-Joseph)
 

 

ML/Google Seminars

Machine Learning Lunchtime Chats

 

 

Date: May 4, 2006
Time: 2:30 PM - 3:30 PM
Location: 3305 Newell-Simon Hall
Speaker: Nachiketa Sahoo MS Candidate
Title: Incremental Hierarchical Clustering of Text Documents
Abstract: Incremental hierarchical text document clustering algorithms are important in organizing documents generated from streaming on-line sources, such as, Newswire and Blogs. However, this is a relatively unexplored area in the text document clustering literature. Popular incremental hierarchical clustering algorithms, namely Cobweb and Classit, have not been applied to text document data. We discuss why, in the current form, these algorithms are not suitable for text clustering and propose an alternative formulation for the same. This includes changes to the underlying distributional assumption of the algorithm in order to conform with the empirical data. Both the original Classit algorithm and our proposed algorithm are evaluated using Reuters newswire articles and Ohsumed dataset, and the gain from using a more appropriate distribution is demonstrated.