| Date: | February 9, 2005 |
| Time: | 10:00 AM - 11:00 AM (Refreshments) |
| Location: | 3305 Newell-Simon Hall |
| Speaker: |
Susan Murphy H.E. Robbins Professor of Statistics & Research Professor, |
| Title: | A Finite Sample Upper Bound on the Generalization Error for Q-Learning |
| Abstract: |
Multi-stage decision problems are beginning to arise with increasing frequency both in the management of chronic, relapsing disorders such as drug dependence and in the maintenance of achieved health behavior changes with the goal of preventing future illness. This is primarily due to the fact that the implementation of good policies mimics the adaptive nature of clinical practice and thus the construction of good policies represents a vehicle for improving such practice. Constructing good policies in these settings is typically challenging because (1) only a training set of finite horizon trajectories of observations and actions is available for learning and (2) the observations are of high dimension. Additionally in many settings the proposed policy must make sense to either or both the clinician and the patient.
Our goal is to construct a good policy in a restricted policy class. Due to the nature of the training set, a natural approach to learning a policy is Q-learning with function approximation. This is because Q-learning is an off-policy method in that the actions in the training set can be chosen by a non-optimal stochastic policy. Furthermore it is also natural to use function approximation since the observations are of high dimension. Of course the chosen approximation space implicitly (further) constraints the class of policies.
We derive finite sample error bounds for Q-Learning with function approximation and discuss how there is a mismatch between the method and the goal of producing the best decision rules within a restricted class. We discuss ideas for reducing the mismatch. |