Duckduckgo API 不返回结果 [英] duckduckgo API not returning results

查看:32
本文介绍了Duckduckgo API 不返回结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑我现在意识到 API 根本不够用,甚至无法正常工作.我想重定向我的问题,我希望能够使用他们的我感觉像鸭子"自动神奇地搜索duckduckgo.这样我就可以搜索stackoverflow"并获取主页(https://stackoverflow.com/")作为我的结果.

我使用的是duckduckgo API.这里

我发现在使用时:

r =duckduckgo.query("example")

结果不反映手动搜索,即:

 r.results 中的结果:打印结果

结果:

<预><代码>>>>>>>

没什么.

results 中查找索引会导致越界错误,因为它是空的.

我应该如何获得搜索结果?

似乎 API(根据其记录的示例)应该以 r.answer.text

的形式回答问题并给出一种我感觉很笨拙">

但是该网站的制作方式使我无法使用正常方法对其进行搜索和解析.

我想知道我应该如何使用此 API 或来自此站点的任何其他方法来解析搜索结果.

谢谢.

解决方案

如果您访问 DuckDuck Go API 页面,你会发现一些关于使用 API 的注意事项.第一个注释清楚地表明:

<块引用>

由于这是一个零点击信息 API,大多数深度查询(非主题名称)将是空白的.

以下是这些字段的列表:

摘要:""摘要文本:"摘要来源:""摘要网址:"图片: ""标题:"回答: ""重定向:"答案类型:"定义: ""定义来源:"定义网址:"相关话题: [ ]结果: [ ]类型: ""

所以可能有点可惜,但他们的API只是截断了一堆结果,并没有给你;可能会更快地工作,而且似乎除了使用 DuckDuckGo.com 之外别无他法.

所以,显然,在这种情况下,API 不是可行的方法.

对我来说,我只看到了一种方法:从 duckduckgo.com 检索原始 html 并使用解析它,例如html5lib(值得一提的是他们的 html 结构良好).

还值得一提的是,解析 html 页面并不是最可靠的数据报废方式,因为 html 结构可以改变,而 API 通常保持稳定,直到公开宣布更改.

以下是如何使用 BeautifulSoup 实现此类解析的示例:

from BeautifulSoup import BeautifulSoup导入 urllib进口重新site = urllib.urlopen('http://duckduckgo.com/?q=example')数据 = site.read()解析 = BeautifulSoup(data)主题 = parsed.findAll('div', {'id': 'zero_click_topics'})[0]results =topics.findAll('div', {'class': re.compile('results_*')})打印结果[0].text

此脚本打印:

u'Eixample,巴塞罗那的内郊区,拥有独特的建筑'

在主页上直接查询的问题是它使用JavaScript来生成所需的结果(不是相关主题),因此您只能使用HTML版本来获取结果.HTML 版本有不同的链接:

让我们看看我们能得到什么:

site = urllib.urlopen('http://duckduckgo.com/html/?q=example')数据 = site.read()解析 = BeautifulSoup(data)first_link = parsed.findAll('div', {'class': re.compile('links_main*')})[0].a['href']

存储在first_link 变量中的结果是指向搜索引擎输出的第一个结果(不是相关搜索)的链接:<块引用>

http://www.iana.org/domains/example

要获取所有链接,您可以遍历找到的标签(可以类似方式接收除链接之外的其他数据)

for i in parsed.findAll('div', {'class': re.compile('links_main*')}):打印 i.a['href']http://www.iana.org/domains/examplehttps://twitter.com/examplehttps://www.facebook.com/leadingbyexamplehttp://www.trythisforexample.com/http://www.myspace.com/leadingbyexample?_escaped_fragment_=https://www.youtube.com/watch?v=CLXt3yh2g0shttps://en.wikipedia.org/wiki/Example_(音乐家)http://www.merriam-webster.com/dictionary/example...

请注意,纯 HTML 版本仅包含结果,对于相关搜索,您必须使用 JavaScript 版本.(网址中没有 html 部分).

Edit I now realize the API is simply inadequate and is not even working. I would like to redirect my question, I want to be able to auto-magically search duckduckgo using their "I'm feeling ducky". So that I can search for "stackoverflow" for instance and get the main page ("https://stackoverflow.com/") as my result.

I am using the duckduckgo API. Here

And I found that when using:

r = duckduckgo.query("example")

The results do not reflect a manual search, namely:

for result in r.results:
    print result

Results in:

>>> 
>>> 

Nothing.

And looking for an index in results results in an out of bounds error, since it is empty.

How am I supposed to get results for my search?

It seems the API (according to its documented examples) is supposed to answer questions and give a sort of "I'm feeling ducky" in the form of r.answer.text

But the website is made in such a way that I can not search it and parse results using normal methods.

I would like to know how I am supposed to parse search results with this API or any other method from this site.

Thank you.

解决方案

If you visit DuckDuck Go API Page, you will find some notes about using the API. The first notes says clearly that:

As this is a Zero-click Info API, most deep queries (non topic names) will be blank.

An here's the list of those fields:

Abstract: ""
AbstractText: ""
AbstractSource: ""
AbstractURL: ""
Image: ""
Heading: ""
Answer: ""
Redirect: ""
AnswerType: ""
Definition: ""
DefinitionSource: ""
DefinitionURL: ""
RelatedTopics: [ ]
Results: [ ]
Type: ""

So it might be a pity, but their API just truncates a bunch of results and does not give them to you; possibly to work faster, and seems like nothing can be done except using DuckDuckGo.com.

So, obviously, in that case API is not the way to go.

As for me, I see only one way out left: retrieving raw html from duckduckgo.com and parsing it using, e.g. html5lib (it worth to mention that their html is well-structured).

It also worth to mention that parsing html pages is not the most reliable way to scrap data, because html structure can change, while API usually stays stable until changes are publicly announced.

Here's and example of how can be such parsing achieved with BeautifulSoup:

from BeautifulSoup import BeautifulSoup
import urllib
import re

site = urllib.urlopen('http://duckduckgo.com/?q=example')
data = site.read()

parsed = BeautifulSoup(data)
topics = parsed.findAll('div', {'id': 'zero_click_topics'})[0]
results = topics.findAll('div', {'class': re.compile('results_*')})

print results[0].text

This script prints:

u'Eixample, an inner suburb of Barcelona with distinctive architecture'

The problem of direct querying on the main page is that it uses JavaScript to produce required results (not related topics), so you can use HTML version to get results only. HTML version has different link:

Let's see what we can get:

site = urllib.urlopen('http://duckduckgo.com/html/?q=example')
data = site.read()
parsed = BeautifulSoup(data)

first_link = parsed.findAll('div', {'class': re.compile('links_main*')})[0].a['href']

The result stored in first_link variable is a link to the first result (not a related search) that search engine outputs:

http://www.iana.org/domains/example

To get all the links you can iterate over found tags (other data except links can be received similar way)

for i in parsed.findAll('div', {'class': re.compile('links_main*')}):
    print i.a['href']

http://www.iana.org/domains/example
https://twitter.com/example
https://www.facebook.com/leadingbyexample
http://www.trythisforexample.com/
http://www.myspace.com/leadingbyexample?_escaped_fragment_=
https://www.youtube.com/watch?v=CLXt3yh2g0s
https://en.wikipedia.org/wiki/Example_(musician)
http://www.merriam-webster.com/dictionary/example
...

Note that HTML-only version contains only results, and for related search you must use JavaScript version. (vithout html part in url).

这篇关于Duckduckgo API 不返回结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆