GraphMR: Distributed Graph Match Method using MapReduce
This work is a distributed graph match method on large data sets. This work was submitted to IEEE Transactions on Kowledge and Data Engineering (TKDE).
Data Generation
This work includes a random graph generator tool. The graph generator is an implementation of the random graph model (Erdős and Rényi 1959) in MapReduce. A graph is constructed in a distributed way across a number of cluster nodes. Therefore, we can get large graphs in a scalable manner from this generator. Besides, It enables users to specify various parameters as follows:
- -directed or undirected graphs
- -the number of vertices
- -average of degree
- -the type of attributed graphs
- -vertices only has attributes
- -vertices and edges has attributes
- -the number of distinct vertex attributes
- -the number of distinct edge attributes
Random Graph Model
There are two standard models for generating random graphs. One is G(n, m) model, where n is the number of vertices and m is the number of edges. This model was proposed by Erdős and Rényi [1]. In order to generate a random graph, this model randomly chooses m edges among nC2 possible edges. Another one is G(n,p), where n is the number of vertices and p is probability of an edge between two vertices among all possible edges. This model was first introduced by E. Gilbert [2]. Our random graph generator provides both models.
Usages
How to Build
Unpack the downloaded source code and build the source as follows:
$ tar xzvf graphmr-0.1-SNAPSHOT.tar.gz $ cd graphmr-0.1-SNAPSHOT $ mvn assembly:assembly -DskipTests $ ls -l target/graphmr-0.1-SNAPSHOT.jar target/graphmr-0.1-SNAPSHOT.jar
Options
The following command prints out various options.
$ hadoop jar graphmr-0.1-SNAPSHOT.jar gen usage: Usage: graphmr.generator.GraphGenerator -a the number of distinct vertex attributes -d,--directed directed graph or undirected (default: undirected) --debug debug mode (default: disabled:) -e the number of distinct edge attributes --expo a exponent value- this option is only used in zipf -k average of degree -l,--labeled none - no label, v - only vertex label, ve - vertex and edge labels (default) --labeldist random - uniform distribution (default), zipf - zipf distribution -m the number of map tasks --model which model: erdos (Erdos and Reiny model), gilbert (a variation of Erdos and Reiny Model), and zipf -o,--out output filename -p,--prob a probability - this option is only used in gilbert -r the number of reduce tasks -t,--text the program outputs vertices as text format
Examples
generating a undirected graph with 10000 vertices and 5 average degree.
# hadoop jar graphmr-0.1-SNAPSHOT.jar gen -k 5 10000 -o out
generating a undirected graph with 10000 vertices and 10 distinct vertex
# hadoop jar graphmr-0.1-SNAPSHOT.jar gen -k 5 -a 10 10000 -o out
generating a undirected graph with 10000 vertices, 10 distinct vertex attributes, and 20 distinct edge attributes.
# hadoop jar graphmr-0.1-SNAPSHOT.jar gen -k 5 -a 10 -e 20 10000 -o out
generating a directed graph with 10000 vertices
# hadoop jar graphmr-0.1-SNAPSHOT.jar gen -d -o out 10000
Source Code
References
- Erdős, P. Rényi, A. “On Random Graphs I” in Publ. Math. Debrecen 6, p. 290-297, 1959.
- Gilbert, E.N. “Random Graphs”. Annals of Mathematical Statistics 30: 1141–1144, 1959.
- Erdős-Rényi model, Wikipedia