Streaming Graph Sampling with Size Restrictions

Abstract

Many graph datasets originating from online social network, financial or biological sources are too large to store or analyze. The analysis of such networks may be made more tractable if they are reduced to smaller subgraphs via sampling. While most of the known graph sampling methods are designed with static graphs in mind, many real datasets are massive and rapidly growing, making streaming methods necessary. We present two new techniques, Randomly Induced Edge Sampling (RIES) and Weighted Edge Sampling (WES). Both methods sample a stream of edges in a single pass, without the need to know future properties of the stream. In contrast to previous work that focused on limiting only the number of vertices, our methods restrict the number of edges, thus truly limiting the size of the sampled subgraph. We compare the performance of RIES and WES against the previously known streaming Random Edge (RE) method on eight social network datasets. Using four structural graph properties, we find that both RIES and WES produce subgraphs that are more structurally similar to the original graph than are the subgraphs produced by streaming RE. We also examine the sensitivity of the two algorithms with respect to their parameters. The parameters of WES affect its performance in a more predictable manner and are easier to set. Both new algorithms represent an improvement in the available streaming graph analysis toolkit.

Publication
Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, Sydney, Australia, July 31 - August 03, 2017