Spring 2008 Seminar

 
 
       
  Machine Learning Seminar Series
 
 
  Seminar Schedule (Seminar Organizer: Prof. Ziv Bar-Joseph)
 

 

ML/Google Seminars

Machine Learning Lunchtime Chats

 

 

Date: March 3, 2008
Time: 4:30 PM - 5:30 PM (Refreshments: 4:15 pm)
Location: 7500 Wean Hall
Speaker: Andrew McCallum Associate Professor
Computer Science Department
University of Massachusetts, Amherst
Title: Injecting Prior Domain Knowledge into Machine Learning
Abstract: Although we often have extensive prior knowledge about how to solve a problem, machine learning is usually performed tabula rasa. I contend that this is because many traditional machine learning methods do not provide natural avenues for injecting prior knowledge. In this talk I will describe mechanisms for incorporating the knowledge of a human domain expert into model selection, inference and parameter estimation.

I will briefly review how conditional random fields support a domain expert's ability to leverage arbitrary features of the input unfettered by concerns about independence assumptions. Then I will describe the tremendous challenges of inference in CRFs defined by probabilistic weighted first-order logic, and how inference methods based on Metropolis-Hastings provide a robust mechanism for injecting domain knowledge to make inference more efficient. Finally I will introduce recent work in Generalized Expectation criteria, a framework for defining parameter estimation objective functions based on expectations arising out of our prior knowledge. One of its manifestations leads to a semi-supervised training regime in which, rather than labeling instances, the user labels features---resulting in a factor of 10 reduction in labeling time to reach equivalent accuracy.

Joint work with colleagues at UMass: Charles Sutton, Aron Culotta, Greg Druck, Ben Wellner, Michael Wick, Rob Hall, an Gideon Mann. Andrew McCallum is an Associate Professor and Director of the Information Extraction and Synthesis Laboratory in the Computer Science Department at University of Massachusetts Amherst. He was previously Vice President of Research and Development at WhizBang Labs, a company that used machine learning for information extraction from the Web. In the late 1990's he was a Research Scientist and Coordinator at Justsystem Pittsburgh Research Center, where he spearheaded the creation of CORA, an early research paper search engine that used machine learning for spidering, extraction, classification and citation analysis. McCallum received his PhD from the University of Rochester in 1995, followed by a post-doctoral fellowship at Carnegie Mellon University. He is the recipient of two NSF ITR awards, the UMass NSM Distinguished Research Award, the UMass Lilly Teaching Fellowship, and the IBM Faculty Partnership Award. He is the Program Co-chair for the International Conference on Machine Learning (ICML) 2008, and a member of the boards of the International Machine Learning Society, the CRA Community Computing Consortium and the editorial board of the Journal of Machine Learning Research. For the past ten years, McCallum has been active in research on statistical machine learning applied to text, especially information extraction, co-reference, document classification, clustering, finite state models, semi-supervised learning, and social network analysis. New work on search and bibliometric analysis of open-access research literature can be found at http://rexa.info.

Host: Google Pittsburgh
Appointments? Contact Cathy Serventi (serventi@google.com)