BLADYG

- BLADYG
  A Novel Block-Centric Framework for the Analysis of LArge DYnamic Graphs
Summary | System Overview | Results | Downloads

Summary:

Recently, distributed processing of large dynamic graphs has become very popular, especially in certain domains such as social network analysis, Web graph analysis and spatial network analysis. In this context, many distributed/parallel graph processing systems have been proposed, such as Pregel, GraphLab, and Trinity. These systems can be divided into two categories: (1) vertex-centric and (2) block-centric approaches. In vertex-centric approaches, each vertex corresponds to a process, and message are exchanged among vertices. In block-centric approaches, the unit of computation is a block, a connected subgraph of the graph, and message exchanges occur among blocks. In this work, we are considering the issues of scale and dynamism in the case of block-centric approaches. We present BLADYG, a block-centric framework that addresses the issue of dynamism in large-scale graphs. BLADYG is implemented on top of AKKA framework.

System overview:

BLADYG

BLADYG System overview

Results:

We have applied BLADYG framework to the problem of distributed k-core decomposition in large dynamic graphs. We have performed a set of experiments to evaluate the effectiveness and efficiency of BLADYG framework on a number of different real and synthetic datasets.

Experimental environment

We have implemented BLADYG on top of the akka framework, a toolkit and runtime for building highly concurrent, distributed, resilient message-driven applications. In order to evaluate the performance of our approach, we used 9 m3.medium instances on Amazon EC2. Each m3.medium instance contained 1 virtual 64-bit CPU, 3.75 GB of main memory a 4 GB of local instance storage.
We compared our BLADYG solution for k-core decomposition with a HBase-based solution proposed by Aksu et al. [1]. The HBase-based solution was tested using 9 m3.medium instances on Amazon EC2: 1 acting as hmaster, namenode and zookeeper node, 8 as datanode and hregion.

Datasets

In order to evaluate and test the effectiveness of BLADYG, we performed an extensive set of experiments on a number of different real and synthetic datasets.

Datasets

Dataset	Type	Number of nodes (N)	Number of edges (M)	Diameter	Average clustering coefficient	Max(k)
DS1	Synthetic	10,000	70,622	4	0.3977	33
DS2	Synthetic	20,000	144,741	4	0.3935	38
DS3	Synthetic	50,000	365,883	4	0.3929	42
DS4	Synthetic	100,000	734,416	4	0.3908	46
ego-Facebook	Real	4,039	88,234	8	0.6055	115
email-Enron	Real	36,692	183,831	11	0.4970	43
roadNet-TX	Real	1,379,917	1,921,660	1,054	0.0470	3
roadNet-CA	Real	1,965,206	2,766,607	849	0.0464	3
com-LiveJournal	Real	3,997,962	34,681,189	17	0.2843	296
soc-LiveJournal1	Real	4,847,571	68,993,773	16	0.2742	318
Real datasets are provided by Stanford University (SNAP). Synthetic datasets were created using the graph data generator proposed by [2] [3].

Scalability

We mention that the presented runtime values of the HBase-based approach correspond to the maintenance time of only one fixed k value core (k = max(k) in our experimental study). This means that, for each dataset, the maintenance process of the HBase-based approach needs to be repeated max(k) times in order to achieve the same results as our approach.

Downloads:

BLADYG source code is downloadable here.
Our Java implementation of the HBase-based [1] approach is available here.

Related publications:

S. Aridhi, A. Montresor, Y. Velegrakis. BLADYG: A Graph Processing Framework for Large Dynamic Graphs. Big Data Research (BDR), Elsevier, 9(C), pp. 9-17, 2017.
S. Aridhi, M. Brugnara,A. Montresor, Y. Velegrakis. Distributed k-core Decomposition and Maintenance in Large Dynamic Graphs. In Proceedings of the 10th ACM International Conference on Distributed and Event-Based Systems (DEBS'16), pp. 161-168, Irvine, 2016.
C. Sakouhi, S. Aridhi, A. Guerrieri, S. Sassi and A. Montresor. DynamicDFEP: A distributed edge partitioning approach for large dynamic graphs. In Proceedings of the 20th International Database Engineering & Applications Symposium (IDEAS'16), pp. 61-168, Montreal, Canada, 2016.
Aridhi S., Montresor A., Velegrakis Y. BLADYG: A novel block-centric framework for the analysis of large dynamic graphs. In Proceedings of the ACM Workshop on High Performance Graph Processing (HPGP'16) @ HPDC'16, pp. 39-42, Kyoto, Japan, 2016.

References:

[1] Aksu H., Canim M., Yuan-Chi Chang, Korpeoglu I., Ulusoy O. Distributed k-Core View Materialization and Maintenance for Large Dynamic Graphs. IEEE Transactions on Knowledge and Data Engineering, vol.26, no.10, pp.2439-2452, Oct. 2014
[2] Alessandra Sala, Lili Cao, Christo Wilson, Robert Zablit, Haitao Zheng, and Ben Y. Zhao. Measurement-calibrated Graph Models for Social Network Experiments. Proceedings of The 19th World Wide Web Conference (WWW 2010). ACM, New York, NY, USA, 861-870.
[3] Alessandra Sala, Xiaohan Zhao, Christo Wilson, Haitao Zheng, and Ben Y. Zhao. Sharing Graphs using Differentially Private Graph Models. Proceedings of The 11th ACM SIGCOMM Internet Measurement Conference (IMC 2011). ACM, New York, NY, USA, 81-98.