Google’s use of the mathematical Markov method in its PageRank algorithm may leave it susceptible to link spamming techniques, US mathematicians have found.
Timothy Chartier, Amy Langville, and their colleagues at Davidson College and the College of Charleston studied the Colley, Massey and Markov methods for rating sports teams.
In a “perfect season” – in which teams faced each competitor exactly once and rankings were clear – the Colley and Massey methods produced uniformly spaced ratings in the expected order.
Meanwhile, the Markov method yielded a non-uniformly spaced long-tailed distribution, in which the ratings of low-ranking teams were highly sensitive to any changes in input data.
The Markov method relied on a “vote-casting” concept, where the loser voted for the winner. Researchers said results displayed "extreme interdependence".
A single instance of a weaker team beating a stronger team sometimes caused the rankings of all teams to be rearranged under the Markov method, while only eliciting an isolated response in Colley and Massey results.
Such ‘perturbations’ occurred whenever a high-ranked webpage linked to a low-ranked one, the researchers reported in the SIAM Journal on Scientific Computing this month.
“This event is precisely the mechanism behind several common link-spamming techniques that aim to unjustly boost the rank of a particular page for mercantile reasons,” they wrote.
“In the Markov method, if a low-rank page can convince a high-ranked page to hyperlink to it, then this endorsement can catapult the lowly page several rungs up the ranked list.”
The Markov distribution was less sensitive outside of its long tail, so top search results would likely remain unaffected by perturbations.
Typical search engine users would thus receive accurately ranked results; however, the researchers noted that search engines pulled webpages from the long tail to answer esoteric search queries.
The researchers recommended that the Markov method be carefully studied, since “sensitivity can result in questionable rankings”.
“Both web page authors and teams sometimes try to game, or spam, ranking systems to achieve a higher ranking,” Langville stated.
“Mathematically, such spamming can be viewed as changes to the input data required by the ranking method.
“As future work, we are exploring the use of the Colley and Massey methods in other settings beyond sports,” she said.
“For example, we have found that these two methods are more appropriate than PageRank for ranking in social networks such as Twitter.”