| Abstract: |
Graphs show up in a surprisingly diverse set of disciplines, ranging from computer networks to sociology, biology, ecology and many more. How do
such "normal" graphs look like? How can we spot abnormal subgraphs within them? Which nodes/edges are "suspicious?" How does a virus spread over a graph? Answering these questions is vital for outlier detection (such as terrorist cells, money laundering rings), forecasting, simulations (how well will a new protocol work on a realistic computer network?), immunization campaigns and many other applications.
We attempt to answer these questions in two parts. First, we investigate recurring patterns in real-world graphs, to gain a deeper understanding of
their structure. This leads to the development of the R-MAT model of graph generation for creating synthetic but "realistic" graphs, which match
many of the patterns found in real-world graphs, including power-law and lognormal degree distributions, small diameter and "community" effects. The second part of our work is targeted at applications: what patterns/properties of a graph are important for solving specific problems? Here, we investigate the propagation behavior of a computer virus over a network, and find a simple formula for the epidemic threshold (beyond which any viral outbreak might become an epidemic). We also
develop a scalable, parameter-free method for finding groups of "similar" nodes in a graph, corresponding to homogeneous regions (or CrossAssociations) in the binary adjacency matrix of the graph. This can
help navigate the structure of the graph, and find un-obvious patterns.Our proposed work involves extending these ideas, and applying these
principles to other areas. |