页面排名算法如何处理没有出站链接的网页? [英] How does pageranking algorithm deal with webpage without outbound links?

查看:115
本文介绍了页面排名算法如何处理没有出站链接的网页?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在了解PageRanking算法,因此对一些新手问题深表歉意。
我知道PR值是通过每个页面本身的传入链接的总和来计算的。

I am learning about the PageRanking algorithm so sorry for some newbie questions. I understand that the PR value is calculated for each page by the summation of incoming links to itself.

现在,我对维基百科

如Wikipedia所示的示例,如果每个页面都有出站链接,则整个页面的总和每页的概率应该是一个。但是,如果在示例中某个页面没有任何出站链接(例如页面A),则总和不应为1值?

As the example shown at wikipedia, if every page has a outbound link, then the summation of whole probabilities from each page should be one. However, if a page does not have any outbound link such as page A at the example, then the summation should not be value 1 right ?

因此,Pagerank算法是否具有假设每个页面至少有一个出站链接? 有人可以详细说明网页排名如何处理没有任何入站或出站链接的页面吗?公式将如何相应改变?谢谢

Thus, does Pagerank algorithm have to assume that every page has at least one outbound link ? Could someone elaborate more how Pageranking deal with pages without any incoming or outbound links ? How will the formulas change accordingly ? Thanks

推荐答案

由于页面排名在原始文章中有所描述,而在Wikipedia文章中,当 out-degree(v)= 0 对于某些 v ,因为您得到 P(v, u)= d / n +(1-d)* 0/0 -未定义

As page-rank is described in the original article, and in the wikipedia article, it is indeed not defined when out-degree(v)=0 for some v, since you get P(v,u)=d/n+(1-d)*0/0 - which is undefined

没有出站边缘的节点称为悬挂节点,基本上有3种常见的方法来处理它们:

A node that has no outgoing edge is called a dangling node and there are basically 3 common ways to take care of them:


  1. 从图(并反复重复该过程,直到没有悬空的节点为止。

  2. 请考虑将这些页面链接回链接到与其链接的页面(即-对于每个边 (u,v),如果 out-degree(v)= 0 ,则考虑(v,u)作为边缘)。

  3. 将悬空节点链接到所有页面(通常包括其自身),并有效地使从该节点1随机跳出的可能性。

  1. Eliminate such nodes from the graph (and repeat the process iteratively until there are no dangling nodes.
  2. Consider those pages to link back to the pages that linked to them (i.e. - for each edge (u,v), if out-degree(v) = 0, regard (v,u) as an edge).
  3. Link the dangling node to all pages (including itself usually), and effectively make the probability for random jump from this node 1.

关于没有传入节点的页面-不应是问题,因为一切都已完美定义。这样的节点的页面排名将精确地为 d / n -因为您只能通过从任何节点随机冲浪来到达它-这就是出现在其中的可能性。

About a page with no incoming node - that shouldn't be an issue because everything is perfectly defined. Such a node will have a page rank of exactly d/n - because you can only get to it by random surfing from any node - and that's the probability to be in it.

希望能回答您的问题!

这篇关于页面排名算法如何处理没有出站链接的网页?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆