| Abstract: |
This proposal is about new statistical machine learning approaches to detect evolving relationships among large numbers of entities.One example is the set of relations among co-authors in Citeseer. Three major recent developments make it an important area of study. First, there has been a dramatic increase in collection of data that can be used to derive the underlying social-network-like structures. Second, the recent rapid growth in probabilistic and statistical approaches to tractable machine learning mean that it may be possible to analyze networks with millions of entities. Third, there are immediate uses for this technology in business, security and the social sciences. Statistical models in traditional social network literature can manage at most a few hundred nodes.
In this thesis I am planning to develop a new stochastic model for describing relations in a social network and evolution of those relations over time. Initial explorations led to the creation of an algorithm for structural search of Bayesian Networks from sparse data (Goldenberg and Moore, 2004). SBNS is a scalable search procedure that learns Bayes Nets from the binary events data, i.e. the estimation is based solely on information about which entities (variables) participated in the set of given events (records). I am planning to build on this work by incorporating properties specific to the entities into the model, such as profession or location (if entities are people), to make sure that the strength of the dependencies is affected by the secondary information available about the entities. I'm also planning to study classes of queries for which the inference can be scaled to answer questions beyond dependencies evident from the structure. Finally, I am planning to extend the model to account for the evolution of the social networks over time. |