从Wikipedia文章中提取统计信息 [英] Extract statistical information from Wikipedia article

查看:105
本文介绍了从Wikipedia文章中提取统计信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用适用于python的SPARQLWrapper从DBpedia文章中提取数据,但我似乎找不到如何提取给定文章的观察者(和其他统计信息)的数量。

I'm currently extracting data from DBpedia articles using a SPARQLWrapper for python, but I can't seem to find how to extract the number of watchers (and other statistical information) for a given article.

有没有简单的方法来实现这一目标?我不介意是通过DBpedia,还是直接通过Wikipedia(例如,使用wget)。

Is there an easy way to achieve this? I don't mind if it's through DBpedia, or directly through wikipedia (using wget, for example).

感谢您的任何建议。

推荐答案

禁止外壳程序获取每个任意文章的观察者数量,因为如果每个人都可以找到未被监视的页面,则这被视为安全漏洞。例如,只有特权用户才能访问特殊:未监视页面。有一个工具服务器工具(可访问数据库)显示了该数字的观察者,但出于相同的原因,它仅限于拥有30个以上的观察者的页面 -至少未经身份验证。

It shell be prohibited to get the number of watchers for every arbitrary article, as it is considered to be a security leak if everyone could find unwatched pages. For example, only privileged users have access to Special:Unwatched Pages. There is a toolserver tool (which has access to the DB) showing the number of watchers, but it is restricted to pages with more than 30 watchers for the same reasons - at least unauthenticated.

MediaWiki查询API 仅公开 大部分内容和状态信息关于文章,尽管您可以查询和评估公共日志修订历史,还可以获取有关(公共)用户操作的统计数据。有关Wikimedia网站的更多统计信息,您可以查看 Meta:Statistics ,其中各种数据源(主要是 http://stats.wikimedia.org/ ),并列出了它们的可视化。

The MediaWiki query API exposes only mostly content and status information about articles, though you can query and evaluate the public logs or revision histories as well to get statistical data about (public) user actions. For more stats about the Wikimedia sites you may have a look at Meta:Statistics, where various data sources (mostly http://stats.wikimedia.org/) and visualisations of them are listed.

这篇关于从Wikipedia文章中提取统计信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆