![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Machine Learning Seminar Series | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Date: | September 8, 2008 |
| Time: | 4:30 PM - 00:00 AM |
| Location: | Lower Level Collaborative Innovation Center |
| Speaker: | Jeremy Kubica Software Engineer, Google Pittsburgh |
| Title: | Extreme Data Mining: The Google Adwords and Adsense Data Mining Infrastructure |
| Abstract: |
That Adwords and Adsense systems at Google provide a continuous, global, high throughput flow of data. Accurately and efficiently utilizing such data is important to the success of such systems, but presents significant computational challenges. In this talk I will discuss challenges of data mining with such extremely high data volumes and some of the techniques employed to address these issues.
I will focus primarily on the challenges of working with extremely high data volumes, where even simple computational problems rapidly become non-trivial. I will discuss how these challenges influence the development at all levels of the software stack: from low-level infrastructure to high-level application design. At each level of the stack, I will describe how these challenges affect how the problem is approached and some of the techniques that were developed to address them. I will also discuss some algorithmic, statistical, and data mining challenges that come out of such high density domains. |
| Speaker Bio: | Jeremy is a software engineer at Google. Jeremy is a recent graduate of the Ph.D. program at Carnegie Mellon University's Robotics Institute. Previously he received a B.S. in Computer Science from Cornell University and a M.S. in Robotics from Carnegie Mellon University. |
