如何获得复杂的维基百科模板的结果? [英] How to get the result of a complex Wikipedia template?

查看:74
本文介绍了如何获得复杂的维基百科模板的结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个很难理解的问题,但我会尽力解释.首先,让我介绍一个示例页面:

This is a question that is a bit hard to follow but I will do my best explaining it. First, let me present an example page:

http://en.wikipedia.org/wiki/African_bush_elephant

这是Wikipedia页面,尤其是特殊页面,因为它的右侧带有"taxobox".我正在尝试使用PHP解析该分类框中的属性. Wikipedia中有两种创建此类分类框的方法:手动或使用特殊的自动分类框"模板.

That's a wikipedia page, a specie page in particular since it has the 'taxobox' to the right. I'm trying to parse the attributes in that taxobox using PHP. There's two ways in Wikipedia to create such a taxobox: manually, or by using the special "auto taxobox" template.

我可以解析一本手册.我使用Wikipedia的API以json格式返回页面的内容,接下来我使用一些正则表达式来获取这些属性.

I can parse the manual one. I use Wikipedia's API to return the page's content in json format, next I use some regular expressions to get those properties.

但是,在使用自动分类箱的情况下,返回的内容如下:

In the case of an auto taxobox, however, the content returned is like this:

> {{automatic taxobox | name = African Bush Elephant<ref
> name=MSW3>{{MSW3 Proboscidea | id = 11500009 | page =
> 91}}</ref> | status = VU | status_system = iucn3.1 | status_ref
> = <ref name=IUCN>{{IUCN2010|assessors=Blanc, J.|year=2008|version=2010.1|id=12392|title=Loxodonta
> africana|downloaded=04 April 2010}}</ref> | trend = unknown |
> image = African Bush Elephant.jpg | taxon = Loxodonta africana |
> synonyms = ''Loxodonta africana africana'' | binomial = ''Loxodonta
> africana'' | binomial_authority = ([[Johann Friedrich
> Blumenbach|Blumenbach]], 1797) }}

如果您将其与实际页面进行比较(就像您在Wikipedia上看到的那样),您会发现缺少几个属性.例如,属性"Kingdom"在实际页面上可见,但在此处未返回.像这样缺少更多的属性.

If you'd compare this with the actual page as you would see it on Wikipedia, you'll notice several attributes are missing. For example, the property "Kingdom" is visible on the real page but not returned here. There's more properties missing like that.

这就像由于模板需要Wikipedia的服务器端命令将模板转换为实际输出一样.我了解到该API具有一个"expandtemplates"操作,您可以发送上述代码段,然后将得到的结果返回给用户,就像用户看到的一样.我将其用于多个模板,并且可以使用,但是不幸的是,不适用于自动分类框模板.单击此链接以查看expandtemplates返回的内容:

This is like due to the template needing Wikipedia's server side command to transform the template into actual output. I learned that the API has an "expandtemplates" action, which you can send a snippet like the one above, and you'll get the results returned as the user would see it. I'm using this for several templates and it works, but unfortunately not for the auto taxobox template. Click this link to see what expandtemplates returns:

如您所见,模板实际上并没有展开.相反,它显示了更多的模板,嵌套并重复了几次.

As you can see, the template doesn't actually expand. Instead, it shows more templates, nested and repeated several times.

因此,现在我不得不尝试从具有自动分类框模板的页面中读取这些属性.我能想到的唯一其他方向是不使用API​​,而只是解析实际页面的html.对于某些属性而言,这样做是可行的,但其他属性则非常脆弱.

So now I'm stuck trying to read these properties from pages that have the auto taxobox template. The only other direction I can think of is to not use the API and to just parse the html of the actual page. That would be doable for some properties, but others are extremely fragile to parse.

推荐答案

使用action=parse代替action=expandtemplates.正如您所注意到的,expandtemplates仅扩展一个级别.此外,它不会完全预处理输入(例如,它将无法成功处理模板中的某些变量引用).

Use action=parse instead of action=expandtemplates. As you've noticed, expandtemplates only expands a single level; additionally, it won't fully preprocess input (e.g, it won't successfully handle certain variable references inside templates).

这篇关于如何获得复杂的维基百科模板的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆