School of Computer Science, carnegie Mellon
Extended Menu Contact Info Directory About SCS Careers Giving to SCS SCS Dean's Advisory Board
CALENDAR OF EVENTS

 

 SCS Calendar Events

 Search for Events by Date

 Submit an Event to the SCS Calendar

When: Thursday, April 19, 4:30 p.m.

Where: 6115 Gates and Hillman Centers

Sanjoy Dasgupta,
Associate Professor
Computer Science and Engineering
University of California, San Deigo

Machine Learning/Google Seminar

Abstract:
Annotation on the Cheap

We consider the task of labeling an entire data set, given the ability to query the labels of specific points within it.

Specifically, we define the "finite population annotation" (FPA) problem as follows. The input consists of: a set of n data points, each of which has an associated label that is initially missing; and parameters delta, epsilon. The algorithm is given the ability to ask for any of the missing labels, and has to produce a set of n labels such that with probability at least 1-delta, at most an epsilon fraction of them are incorrect. The goal is to meet this statistical requirement without asking for very many labels.

This problem is motivated by scenarios in e-discovery and active learning. We describe three algorithms that solve it by exploiting different types of structure in the data. In each case, we characterize the types of data for which the algorithm requires few labels, and we give experimental illustrations of its behavior.

This is joint work with David Lewis and Matus Telgarsky.

Sanjoy Dasgupta develops algorithms for the statistical analysis of high-dimensional data. Such data is now widespread, in domains ranging from environmental modeling to genomics to web search.

Prior to joining the UCSD Jacobs School in 2002, professor Dasgupta was a senior member of the technical staff at AT&T Labs-Research, where his work focused on algorithms for data mining, with applications to speech recognition and to the analysis of business data. Professor Dasgupta received a Ph.D. in Computer Science in 2000 from UC Berkeley and a B.A. in Computer Science from Harvard in 1993. He is a member of the editorial boards of the Journal of Machine Learning Research, the Journal of Artificial Intelligence Research, and the Machine Learning Journal.

Faculty Host: Steve Hanneke
Appointments: sharonw@cs.cmu.edu

Poster

<< Back