| Abstract: |
The main focus of this proposal is on understanding and analyzing entity relationships in
large social networks. The broad range of applications of graph based learning problems includes collaborative filtering in recommender networks, link prediction in social networks (e.g. predicting future links from the current snapshot of a graph), fraud detection and personalized graph search. In all these real world problems the main question is: given a node, which other nodes are similar to it?
We use proximity measures based on short term random walks, namely truncated hitting and commute times, to compute approximate nearest neighbors in large graphs. In previous work we have combined sampling with deterministic pruning to retrieve top k neighbors of a query incrementally without caching information about all nodes. Our algorithms can answer nearest neighbor queries on graphs of size 600,000 nodes in 3 seconds on average on a single CPU machine. We have shown that on several link prediction
tasks on real world datasets these measures outperform other popular proximity measures (e.g.
personalized pagerank) in terms of predictive power.
In this thesis we propose to analyze the short-term behavior of random walks to improve upon our algorithm, investigate other graph-based proximity measures, and apply these algorithms on real-world tasks such as semi-supervised learning, information retrieval, recommender networks, and a meta-learning approach for link prediction.Thesis Committee: Andrew Moore (Chair), Geoff Gordon, Anupam Gupta, Jon Kleinberg (Cornell)
http://www.cs.cmu.edu/~psarkar/proposal/sarkar_thesis_proposal.pdf
|