使用Wikipedia API查找文章的主要类别 [英] Find main category for article using Wikipedia API

查看:134
本文介绍了使用Wikipedia API查找文章的主要类别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文章列表,我想找到每篇文章的主要类别.

I have a list of articles and I want to find the main category of each article.

Wikipedia在此处列出了其主要类别- http://en.wikipedia.org/wiki /Portal:内容/类别.

Wikipedia lists its main categories here - http://en.wikipedia.org/wiki/Portal:Contents/Categories.

我可以使用以下内容找到每篇文章的子类别:

I am able to find the subcategories of each article using:

http://en.wikipedia.org/w/api.php?action=query&prop=categories&titles=%s&format=xml

我还可以检查子类别是否在类别中:

I also am able to check whether a subcategory is within a category:

http://en.wikipedia.org/w/api.php?action=query&titles=Dog&prop=categories&clcategories=Domesticated animals&format=xml

这将告诉我驯养的动物"是否是Dog的子类别,但这不是我想要的.我希望能够检查驯养动物"属于哪个主要类别.使用API​​可以做到这一点吗?

This will tell me whether "domesticated animals" is a subcategory of Dog, but this is not quite what I want. I want to be able to check which main category 'domesticated animals' is in. Is this possible using the API?

推荐答案

首先,没有诸如"Wikipedia API"之类的东西.有一个MediaWiki(网络)API.知道这一点将帮助您找到有关现有工具的信息. https://www.mediawiki.org/wiki/API:Main_Page

First, there is no such thing as a "Wikipedia API". There is a MediaWiki (web) API. Knowing this will help you find information on the existing tools. https://www.mediawiki.org/wiki/API:Main_Page

这告诉您没有API可以为您完成所有类别递归.为什么?因为1)效率极低,所以2)递归可能会随处可见,甚至永远不会结束.

Which tells you there is no API which will do all the category recursion for you. Why? Because 1) it's extremely inefficient, 2) the recursion might go anywhere or never end.

但是,现在有一个解决方案,作者:马格努斯·曼斯克(Magnus Manske):

However there is a solution now, by Magnus Manske: https://tools.wmflabs.org/catscan2/reverse_tree.php?doit=1&language=en&project=wikipedia&title=Dog&namespace=0 "Maximum depth: 61 levels Total categories along the way : 7988" Using that definition, the "root" category for [[Dog]], i.e. the farthest father category, is "Industry by country". Probably not what you expected! However, from the English Wikipedia's perspective the root category for any article is always the same, [[Category:Contents]].

这篇关于使用Wikipedia API查找文章的主要类别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆