duckduckgo API不返回结果 [英] duckduckgo API not returning results

查看:261
本文介绍了duckduckgo API不返回结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

修改我现在认识的API仅仅是不够的,甚至没有工作。
我想我的重定向问题,我希望能够自动搜索神奇地使用duckduckgo自己我感觉鸭子。所以,我可以搜索计算器来说吧,让主界面( http://stackoverflow.com/ )作为我的结果

Edit I now realize the API is simply inadequate and is not even working. I would like to redirect my question, I want to be able to auto-magically search duckduckgo using their "I'm feeling ducky". So that I can search for "stackoverflow" for instance and get the main page ("http://stackoverflow.com/") as my result.

我使用的是duckduckgo API。 这里

I am using the duckduckgo API. Here

和我发现,使用时:

r = duckduckgo.query("example")

结果并不能反映一个手动搜索,即:

The results do not reflect a manual search, namely:

for result in r.results:
    print result

结果:

>>> 
>>> 

无。

而在结果寻找一个指数导致一个出界失误,因为它是空的。

And looking for an index in results results in an out of bounds error, since it is empty.

我怎样才能得到结果的搜索?

How am I supposed to get results for my search?

这似乎(根据其记录的例子)的API应该回答问题,并给出了一种在 r.answer.text

It seems the API (according to its documented examples) is supposed to answer questions and give a sort of "I'm feeling ducky" in the form of r.answer.text

不过,该网站是在这样的,我不能搜索,并用普通方法解析结果的方式进行。

But the website is made in such a way that I can not search it and parse results using normal methods.

我想知道我应该如何解析的搜索结果与此API或从本网站的任何其他方法。

I would like to know how I am supposed to parse search results with this API or any other method from this site.

感谢您。

推荐答案

如果您访问 DuckDuck转到API页面,你会发现有关使用API​​的一些注意事项。第一个音符明确指出:

If you visit DuckDuck Go API Page, you will find some notes about using the API. The first notes says clearly that:

由于这是一个零点击信息API,最深处查询(非主题名称)
  为空白。

As this is a Zero-click Info API, most deep queries (non topic names) will be blank.

这是这里的这些字段的列表:

An here's the list of those fields:

Abstract: ""
AbstractText: ""
AbstractSource: ""
AbstractURL: ""
Image: ""
Heading: ""
Answer: ""
Redirect: ""
AnswerType: ""
Definition: ""
DefinitionSource: ""
DefinitionURL: ""
RelatedTopics: [ ]
Results: [ ]
Type: ""

因此​​,它可能是一个遗憾,但他们的API只是截断了一堆结果,并没有给他们你;可能工作得更快,而且好像也没什么可除了使用 DuckDuckGo.com 。

所以,很显然,这种情况下,API中是不是要走的路。

So, obviously, in that case API is not the way to go.

对于我来说,我只看到一个出路左:从 duckduckgo.com 检索原始的HTML,并使用解析它,例如 html5lib (值得一提的是他们的HTML是结构良好)。

As for me, I see only one way out left: retrieving raw html from duckduckgo.com and parsing it using, e.g. html5lib (it worth to mention that their html is well-structured).

这也是值得一提的是解析HTML网页是不是报废数据,因为HTML结构可以改变的最可靠方法,直到更改都公开宣布,而API通常保持稳定。

It also worth to mention that parsing html pages is not the most reliable way to scrap data, because html structure can change, while API usually stays stable until changes are publicly announced.

下面是和榜样如何能与 BeautifulSoup 取得这样解析:

Here's and example of how can be such parsing achieved with BeautifulSoup:

from BeautifulSoup import BeautifulSoup
import urllib
import re

site = urllib.urlopen('http://duckduckgo.com/?q=example')
data = site.read()

parsed = BeautifulSoup(data)
topics = parsed.findAll('div', {'id': 'zero_click_topics'})[0]
results = topics.findAll('div', {'class': re.compile('results_*')})

print results[0].text

本脚本打印:

u'Eixample, an inner suburb of Barcelona with distinctive architecture'

直接查询的主页上的问题是,它使用JavaScript来产生所需的结果(不相关的主题),所以你可以使用HTML版本只得到结果。 HTML版本有不同的链接:

The problem of direct querying on the main page is that it uses JavaScript to produce required results (not related topics), so you can use HTML version to get results only. HTML version has different link:

  • http://duckduckgo.com/?q=example # JavaScript version
  • http://duckduckgo.com/html/?q=example # HTML-only version

让我们看看我们可以得到:

Let's see what we can get:

site = urllib.urlopen('http://duckduckgo.com/html/?q=example')
data = site.read()
parsed = BeautifulSoup(data)

first_link = parsed.findAll('div', {'class': re.compile('links_main*')})[0].a['href']

保存在 first_link 结果变量是第一个链接的结果的(不是的相关搜索的)的搜索引擎输出:

The result stored in first_link variable is a link to the first result (not a related search) that search engine outputs:

http://www.iana.org/domains/example

要得到所有你可以遍历找到的链接标签(除了链接其他数据可以收到类似的方法)

To get all the links you can iterate over found tags (other data except links can be received similar way)

for i in parsed.findAll('div', {'class': re.compile('links_main*')}):
    print i.a['href']

http://www.iana.org/domains/example
https://twitter.com/example
https://www.facebook.com/leadingbyexample
http://www.trythisforexample.com/
http://www.myspace.com/leadingbyexample?_escaped_fragment_=
https://www.youtube.com/watch?v=CLXt3yh2g0s
https://en.wikipedia.org/wiki/Example_(musician)
http://www.merriam-webster.com/dictionary/example
...

请注意,只有HTML版本只包含的结果的,而对于的相关搜索的必须使用JavaScript版本。 (vithout HTML 中的URL部分)。

Note that HTML-only version contains only results, and for related search you must use JavaScript version. (vithout html part in url).

这篇关于duckduckgo API不返回结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆