Spring 2005 Seminar

 
 
       
  Machine Learning Seminar Series
 
 
  Seminar Schedule (Seminar Organizer: Prof. Ziv Bar-Joseph)
 

 

ML/Google Seminars

Machine Learning Lunchtime Chats

 

 

Date: February 9, 2005
Time: 10:00 AM - 11:00 AM (Refreshments)
Location: 3305 Newell-Simon Hall
Speaker: Susan Murphy H.E. Robbins Professor of Statistics & Research Professor,
Title: A Finite Sample Upper Bound on the Generalization Error for Q-Learning
Abstract: Multi-stage decision problems are beginning to arise with increasing frequency both in the management of chronic, relapsing disorders such as drug dependence and in the maintenance of achieved health behavior changes with the goal of preventing future illness. This is primarily due to the fact that the implementation of good policies mimics the adaptive nature of clinical practice and thus the construction of good policies represents a vehicle for improving such practice. Constructing good policies in these settings is typically challenging because (1) only a training set of finite horizon trajectories of observations and actions is available for learning and (2) the observations are of high dimension. Additionally in many settings the proposed policy must make sense to either or both the clinician and the patient. Our goal is to construct a good policy in a restricted policy class. Due to the nature of the training set, a natural approach to learning a policy is Q-learning with function approximation. This is because Q-learning is an off-policy method in that the actions in the training set can be chosen by a non-optimal stochastic policy. Furthermore it is also natural to use function approximation since the observations are of high dimension. Of course the chosen approximation space implicitly (further) constraints the class of policies. We derive finite sample error bounds for Q-Learning with function approximation and discuss how there is a mismatch between the method and the goal of producing the best decision rules within a restricted class. We discuss ideas for reducing the mismatch.