如何仅获取维基百科文章的已分析信息框? [英] How do I grab just the parsed Infobox of a wikipedia article?

查看:107
本文介绍了如何仅获取维基百科文章的已分析信息框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我仍然在尝试解析Wikipedia的文章时遇到问题.实际上,我希望解析维基百科文章的信息框"部分,即我的应用程序引用了国家/地区,并且在每个国家/地区页面上,我都希望能够显示该国家/地区相应维基百科文章上的信息框.我在这里使用php-如果有人对我应该在这里做什么有任何代码段或建议,我将不胜感激.

I'm still stuck on my problem of trying to parse articles from wikipedia. Actually I wish to parse the infobox section of articles from wikipedia i.e. my application has references to countries and on each country page I would like to be able to show the infobox which is on corresponding wikipedia article of that country. I'm using php here - I would greatly appreciate it if anyone has any code snippets or advice on what should I be doing here.

再次感谢.

编辑

好吧,我有一个带有国家名称的数据库表.我有一个脚本,其中包含一个国家并显示其详细信息.我想获取信息框-带有所有国家/地区详细信息图像的蓝框,就像它来自Wikipedia一样,并将其显示在我的页面上.我想知道一种非常简单易行的方法-或拥有一个脚本,该脚本将信息框的信息下载到本地远程系统中,以后我可以访问自己.我的意思是,我愿意接受这里的想法-除了最终的结果是要看到页面上的信息框-当然,底部还有一个由Wikipedia提供的Content链接:)

Well I have a db table with names of countries. And I have a script that takes a country and shows its details. I would like to grab the infobox - the blue box with all country details images etc as it is from wikipedia and show it on my page. I would like to know a really simple and easy way to do that - or have a script that just downloads the information of the infobox to a local remote system which I could access myself later on. I mean I'm open to ideas here - except that the end result I want is to see the infobox on my page - of course with a little Content by Wikipedia link at the bottom :)

编辑

我想我在 http://infochimps.org 上找到了所需的内容-他们获得了大量的数据集在我看来是YAML语言.我可以直接使用这些信息,但是我需要一种不时地从Wikipedia不断更新该信息的方法,尽管我相信除非某些国家决定改变首都,否则信息框很少会发生变化,特别是在某些国家/地区.

I think I found what I was looking for on http://infochimps.org - they got loads of datasets in I think the YAML language. I can use this information straight up as it is but I would need a way to constantly update this information from wikipedia now and then although I believe infoboxes rarely change especially o countries unless some nation decides to change their capital city or so.

推荐答案

我建议针对Wikipedia执行WebRequest.在这里,您将拥有该页面,您可以使用正则表达式,字符爬网或您熟悉的其他某种形式来简单地解析或查询所需的数据.本质上是屏幕刮擦!

I suggest performing a WebRequest against wikipedia. From there you will have the page and you can simply parse or query out the data that you need using a regex, character crawl, or some other form that you are familiar with. Essentially a screen scrape!

编辑-我将添加到此答案中,以便您可以对C#领域的用户使用HtmlAgilityPack.对于PHP,它看起来像SimpleHtmlDom.话虽这么说,看起来Wikipedia拥有足够的API.这个问题可能最能满足您的需求:

EDIT - I would add to this answer that you can use HtmlAgilityPack for those in C# land. For PHP it looks like SimpleHtmlDom. Having said that it looks like Wikipedia has a more than adequate API. This question probably answers your needs best:

是否有Wikipedia API?

这篇关于如何仅获取维基百科文章的已分析信息框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆