获取维基百科上包含特定单词的所有页面标题 [英] Get all page titles on Wikipedia that contain a specific word

查看:166
本文介绍了获取维基百科上包含特定单词的所有页面标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用HTML和JavaScript编写一个自动维基百科"工具.对于要发音的文本中的每个单词,我需要获取包含该单词的页面列表(这样,如果找到匹配的短语,则可以自动将其匹配).是否可以使用Wikipedia的API或网络服务之一来获取包含特定单词的所有Wikipedia页面的列表?

I am writing an "auto-wikifier" tool using HTML and JavaScript. For each word in the text to be wikified, I need to obtain a list of pages that contain that word (so that the matching phrases in the text can be automatically wikified, if they are found). Is there a way to obtain a list of all Wikipedia pages that contain a specific word, using one of Wikipedia's APIs or web services?

function getMatchingPageTitles(theString){
    //get a list of all matching page titles for a specific string, using one of Wikipedia's APIs or web services
}

推荐答案

第一不是 确定 了解 如何 某物 喜欢 那个 成为 有用. (维基百科上有所有常用词的文章,我认为与它们的链接没有任何用处.)

First, I'm not sure I understand how would something like that be useful. (Wikipedia has articles for all the common words and I don't think links to them would be of any use.)

但是,如果您真的想做这样的事情,我认为更好的方法是使用 API 找出您输入文本中的哪些单词包含文章.

But if you really wanted to do something like this, I think a much better way would be to use the API to find out which words from your input text have articles.

例如,对于字符串I am writing an "auto-wikifier" tool,您的查询可能类似于:

For example, for the string I am writing an "auto-wikifier" tool, your query could look something like:

http ://en.wikipedia.org/w/api.php?format = xml& action = query& titles = I | am | writing | an | auto-wikifier | tool

答案是:

<api>
  <query>
    <normalized>
      <n from="am" to="Am" />
      <n from="writing" to="Writing" />
      <n from="an" to="An" />
      <n from="auto-wikifier" to="Auto-wikifier" />
      <n from="tool" to="Tool" />
    </normalized>
    <pages>
      <page ns="0" title="Auto-wikifier" missing="" />
      <page pageid="2513432" ns="0" title="Am" />
      <page pageid="2513422" ns="0" title="An" />
      <page pageid="25346998" ns="0" title="I" />
      <page pageid="30677" ns="0" title="Tool" />
      <page pageid="32977" ns="0" title="Writing" />
    </pages>
  </query>
</api>

一些注意事项:

  • 结果不符合您指定的顺序.
  • 如果页面不存在,则结果具有missing=""属性.
  • JSON和JSONP格式也可用,这可能更适合JavaScript.
  • titles参数每个查询的限制为50.
  • The results are not in the order you specified them.
  • If a page doesn't exist, the result has missing="" attribute.
  • JSON and JSONP formats are available too, that might be more suitable for JavaScript.
  • The titles parameter has a limit of 50 per one query.

这篇关于获取维基百科上包含特定单词的所有页面标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆