PyPi 下载计数似乎不切实际 [英] PyPi download counts seem unrealistic

查看:55
本文介绍了PyPi 下载计数似乎不切实际的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在大约 2 个月前第一次将 一个包放在 PyPi 上,并且已经做了从那时起一些版本更新.本周我注意到下载计数记录,并惊讶地发现它已被下载数百次.在接下来的几天里,我更惊讶地看到下载数量有时会增加数百每天,尽管这是一个小众的统计测试工具箱.特别是,旧版本的软件包不断被下载,有时下载速度高于最新版本.

I put a package on PyPi for the first time ~2 months ago, and have made some version updates since then. I noticed this week the download count recording, and was surprised to see it had been downloaded hundreds of times. Over the next few days, I was more surprised to see the download count increasing by sometimes hundreds per day, even though this is a niche statistical test toolbox. In particular, older versions of package are continuing to be downloaded, sometimes at higher rates than the newest version.

这里发生了什么?

PyPi 的下载计数是否存在错误,或者是否有大量爬虫抓取开源代码(就像我的一样)?

Is there a bug in PyPi's downloaded counting, or is there an abundance of crawlers grabbing open source code (as mine is)?

推荐答案

在这一点上,这是一个老问题,但我注意到我在 PyPI 上的一个包也有同样的问题,并进一步调查.事实证明,PyPI 保留了相当详细的下载统计数据,包括(显然是稍微匿名的)用户代理.很明显,大多数下载我的包的人都是z3c.pypimirror/1.0.15.1"和pep381client/1.5"之类的东西.(PEP 381 描述了 PyPI 的镜像基础架构.)

This is kind of an old question at this point, but I noticed the same thing about a package I have on PyPI and investigated further. It turns out PyPI keeps reasonably detailed download statistics, including (apparently slightly anonymised) user agents. From that, it was apparent that most people downloading my package were things like "z3c.pypimirror/1.0.15.1" and "pep381client/1.5". (PEP 381 describes a mirroring infrastructure for PyPI.)

我编写了一个快速脚本来汇总所有内容,首先包括所有内容,然后忽略最明显的机器人,事实证明,字面上 99% 的包下载活动是由镜像机器人引起的:总共 14,335 次下载,相比之下,只有 146 次下载被过滤了机器人.这只是忽略了非常明显的那些,所以它可能仍然是高估了.

I wrote a quick script to tally everything up, first including all of them and then leaving out the most obvious bots, and it turns out that literally 99% of the download activity for my package was caused by mirrorbots: 14,335 downloads total, compared to only 146 downloads with the bots filtered. And that's just leaving out the very obvious ones, so it's probably still an overestimate.

看起来 PyPI 需要镜像的主要原因是因为它有镜像.

It looks like the main reason PyPI needs mirrors is because it has them.

这篇关于PyPi 下载计数似乎不切实际的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆