Recently, distributed processing of large dynamic graphs has become very popular, especially in certain domains such as social network analysis, Web graph analysis and spatial network analysis. In this context, many distributed/parallel graph processing systems have been proposed, such as Pregel, GraphLab, and Trinity.
These systems can be divided into two categories: (1) vertex-centric and (2) block-centric approaches. In vertex-centric approaches, each vertex corresponds to a process, and message are exchanged among vertices. In block-centric approaches, the unit of computation is a block, a connected subgraph of the graph, and message exchanges occur among blocks. In this work, we are considering the issues of scale and dynamism in the case of block-centric approaches. We present BLADYG, a block-centric framework that addresses the issue of dynamism in large-scale graphs.
BLADYG is implemented on top of AKKA framework.
BLADYG System overview
We have applied BLADYG framework to the problem of distributed k-core decomposition in large dynamic graphs. We have performed a set of experiments to evaluate the effectiveness and efficiency of BLADYG framework on a number of different real and synthetic datasets.
We have implemented BLADYG on top of the akka framework, a toolkit and runtime for building highly concurrent, distributed, resilient message-driven applications. In order to evaluate the performance of our approach, we used 9 m3.medium instances on Amazon EC2. Each m3.medium instance contained 1 virtual 64-bit CPU, 3.75 GB of main memory a 4 GB of local instance storage.
We compared our BLADYG solution for k-core decomposition with a HBase-based solution proposed by Aksu et al. [1]. The HBase-based solution was tested using 9 m3.medium instances on Amazon EC2: 1 acting as hmaster, namenode and zookeeper node, 8 as datanode and hregion.
In order to evaluate and test the effectiveness of BLADYG, we performed an extensive set of experiments on a number of different real and synthetic datasets.
Dataset | Type | Number of nodes (N) | Number of edges (M) | Diameter | Average clustering coefficient | Max(k) |
---|---|---|---|---|---|---|
DS1 | Synthetic | 10,000 | 70,622 | 4 | 0.3977 | 33 |
DS2 | Synthetic | 20,000 | 144,741 | 4 | 0.3935 | 38 |
DS3 | Synthetic | 50,000 | 365,883 | 4 | 0.3929 | 42 |
DS4 | Synthetic | 100,000 | 734,416 | 4 | 0.3908 | 46 |
ego-Facebook | Real | 4,039 | 88,234 | 8 | 0.6055 | 115 |
email-Enron | Real | 36,692 | 183,831 | 11 | 0.4970 | 43 |
roadNet-TX | Real | 1,379,917 | 1,921,660 | 1,054 | 0.0470 | 3 |
roadNet-CA | Real | 1,965,206 | 2,766,607 | 849 | 0.0464 | 3 |
com-LiveJournal | Real | 3,997,962 | 34,681,189 | 17 | 0.2843 | 296 |
soc-LiveJournal1 | Real | 4,847,571 | 68,993,773 | 16 | 0.2742 | 318 |
Real datasets are provided by Stanford University (SNAP). Synthetic datasets were created using the graph data generator proposed by [2] [3]. |
||||||