Software > Similarity
Flooding Algorithm
Similarity flooding is a matching algorithm
based on fix-point computation that is usable across different scenarios
[Melnek, S., et. al. 2002]. The algorithm takes
two graphs in Pajek format and produces as output a mapping between corresponding
nodes of the graphs. Depending on the matching goal, a subset of the mapping
is chosen using a set of filters. The percentage of accurate match done
by the algorithm is a measure of its accuracy. The resulting output is then converted into a Pajek input file, which can then be used to easily visualise the similarities and differences between the datasets using appropriate color coding. The amount of human intervention required in correcting the results is used as a measure of accuracy.
The main advantage of Similarity Flooding lies in its ability to be used across a wide variety of datasets. The algorithm works independent of the complexities of the underlying data and utilizes the relationship between the data points to estimate similarity. The disadvantage of Similarity Flooding is its dependence on human intervention to compute accurate matches.
It is general graph matching algorithm that can be used to map systems that can be represented as graphs.
The algorithm is implemented is Java. The code was customized to read the pajek input format. The algorithm performs a similarity match across all the network nodes. It generates a similarity measurement between all nodes of the two network. The resultant pajek file can be generated using a particular threshold value.
The original code is available for download at http://www-db.stanford.edu/~melnik/mm/sfa/ The orginal code has been modified to accept the pajek input format. The modified code can be downloaded here To run At the command prompt, type: java com.interdatanetworking.mm.alg.Match <sample1PajekInput.txt> <sample2PajekInput.txt> <results.txt>where,
From the similarity results file, it can be infered that the resultant similarity between node 1 in the first network and that in the 2nd is 1.0, while that between 1 and 2 is 0.193, etc. Further filtering can be applied by using threshold values for the similarity measure by using the follwing perl script (simPajekGen.pl). To run the script, useperl simPajekGen.pl <sim-results file> <nw1> <nw2> <output file> <threshold value>
This documentation was compiled by Lalitha Viswanath, Ketan Mane and Katy Börner. |