我的网页排名之和为0.9收敛 [英] The sum of my page ranks converge at 0.9

查看:192
本文介绍了我的网页排名之和为0.9收敛的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我计算一组爬域名的网页排名,使用0.85阻尼因子。正如在许多页提到的行列论文,页面的PR值的总和应收敛于1。但不管有多少反复做,这似乎收敛于0.90xxx。如果我低衰减系数为0.5,我靠拢1显而易见的。

When I'm calculating page ranks of a set of crawled domains, using a dampening factor of 0.85. As mentioned in many page ranks papers, the sum of pageranks should converge to 1. But regardless of how many iterations I do, it seems to converge at 0.90xxx. If I lower dampening factor to 0.5, I move closer to 1 obviously.

是不是坏了网页排名总和收敛于0.90,什么会,这通常牵连?

Is it bad that the page ranks sum converge at 0.90, and what would this generally implicate?

推荐答案

这成为了算法:

// data structures

private HashMap<String, Double> pageRanks;
private HashMap<String, Double> oldRanks;
private HashMap<String, Integer> numberOutlinks;
private HashMap<String, HashMap<String, Integer>> inlinks;
private HashSet<String> domainsWithNoOutlinks;
private double N;

// data parsing occluded

public void startAlgorithm() {
    int maxIterations = 20;
    int itr = 0;
    double d = 0.85;
    double dp = 0;
    double dpp = (1 - d) / N;

    // initialize pagerank
    for (String s : oldRanks.keySet()) {
        oldRanks.put(s, 1.0 / N);
    }

    System.out.println("Starting page rank iterations..");
    while (maxIterations >= itr) {
        System.out.println("Iteration: " + itr);
        dp = 0;
        // teleport probability
        for (String domain : domainsWithNoOutlinks) {
            dp = dp + d * oldRanks.get(domain) / N;
        }
        for (String domain : oldRanks.keySet()) {
            pageRanks.put(domain, dp + dpp);

            for (String inlink : inlinks.get(domain).keySet()) { // for every inlink of domain
                pageRanks.put(domain, pageRanks.get(domain) + inlinks.get(domain).get(inlink) * d * oldRanks.get(inlink) / numberOutlinks.get(inlink));
            }
        }
        // update pageranks with new values
        for (String domain : pageRanks.keySet()) {
            oldRanks.put(domain, pageRanks.get(domain));
        }

        itr++;
    }
}

这条线是重要的:

Where this line is the important one:

pageRanks.put(domain, pageRanks.get(domain) + inlinks.get(domain).get(inlink) * d * oldRanks.get(inlink) / numberOutlinks.get(inlink));

inlinks.get(域)获得(内链接)返回多少的内链接如/引用当前域,我们除以多少反向链接,目前域名拥有。而inlinks.get(域)获得(内链接)是我错过了我的算法因此为什么总和不收敛于1。

inlinks.get(domain).get(inlink) returns how much an inlink "like/referenced" the current domain, and we divide that by how many inlinks that current domain have. And "inlinks.get(domain).get(inlink)" is what I missed in my algorithm hence why the sum didn't converge at 1.

了解更多: http://www.ccs.northeastern.edu /home/daikeshi/notes/PageRank.pdf

这篇关于我的网页排名之和为0.9收敛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆