Networkx:pagerank,pagerank_numpy和pagerank_scipy之间的区别? [英] Networkx: Differences between pagerank, pagerank_numpy, and pagerank_scipy?

查看:610
本文介绍了Networkx:pagerank,pagerank_numpy和pagerank_scipy之间的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人知道Networkx中三个不同的页面等级函数之间的精度差异吗?

Does anyone know about the differences in accuracy between the three different pagerank functions in Networkx?

我有一个包含1000个节点和139732个边的图形,纯" pagerank函数似乎根本不起作用-除了两个节点外,所有其他节点都具有相同的PG,因此我假设函数不适用于大型图形吗?

I have a graph of 1000 nodes and 139732 edges, and the "plain" pagerank function didn't seem to work at all -- all but two of the nodes had the same PG, so I'm assuming this function doesn't work quite as well for large graphs?

pagerank_numpy的值似乎也比pagerank_scipy的值散布了一些.该函数的文档说:对于小图形,这将是最快,最准确的." 小"图是什么意思?

pagerank_numpy's values also seemed to be a little bit more spread out than pagerank_scipy's values. The documentation for this function says that "This will be the fastest and most accurate for small graphs." What is meant by "small" graphs?

此外,为什么pagerank_numpy不允许max_itertol自变量?

Also, why doesn't pagerank_numpy allow for max_iter and tol arguments?

推荐答案

三个函数中的每个函数都使用不同的方法来解决相同的问题:

Each of the three functions uses a different approach to solving the same problem:

networkx.pagerank()是幂方法的纯Python实现,用于计算最大特征值/特征向量或Google矩阵.它有两个控制精度的参数-tolmax_iter.

networkx.pagerank() is a pure-Python implementation of the power-method to compute the largest eigenvalue/eigenvector or the Google matrix. It has two parameters that control the accuracy - tol and max_iter.

networkx.pagerank_scipy()是幂方法的SciPy稀疏矩阵实现.它具有两个相同的精度参数.

networkx.pagerank_scipy() is a SciPy sparse-matrix implementation of the power-method. It has the same two accuracy parameters.

networkx.pagerank_numpy()是NumPy(完整)矩阵实现,它调用numpy.linalg.eig()函数来计算最大特征值和特征向量.该函数是LAPACK dgeev函数的接口,该函数使用没有可调参数的矩阵分解(直接)方法.

networkx.pagerank_numpy() is a NumPy (full) matrix implementation that calls the numpy.linalg.eig() function to compute the largest eigenvalue and eigenvector. That function is an interface to the LAPACK dgeev function which is uses a matrix decomposition (direct) method with no tunable parameters.

如果tol参数足够小并且max_iter参数足够大,则对于行为良好的图形,这三个函数都应产生相同的答案(在数值舍入内).哪一个速度更快取决于图形的大小以及幂方法在图形上的工作效果.

All three should produce the same answer (within numerical roundoff) for well-behaved graphs if the tol parameter is small enough and the max_iter parameter is large enough. Which one is faster depends on the size of your graph and how well the power method works on your graph.

In [12]: import networkx as nx

In [13]: G=nx.gnp_random_graph(1000,0.01,directed=True)

In [14]: %timeit nx.pagerank(G,tol=1e-10)
10 loops, best of 3: 157 ms per loop

In [15]: %timeit nx.pagerank_scipy(G,tol=1e-10)
100 loops, best of 3: 14 ms per loop

In [16]: %timeit nx.pagerank(G)
10 loops, best of 3: 137 ms per loop

这篇关于Networkx:pagerank,pagerank_numpy和pagerank_scipy之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆