使用自定义搜索在Python中以编程方式搜索Google [英] Programmatically searching google in Python using custom search

查看:102
本文介绍了使用自定义搜索在Python中以编程方式搜索Google的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用pygoogle python模块的代码段,可让我以编程方式在Google中简洁地搜索某些术语:

I have a snippet of code using the pygoogle python module that allows me to programmatically search for some term in google succintly:

 g = pygoogle(search_term)
 g.pages = 1
 results = g.get_urls()[0:10]

我刚刚发现不幸的是,它已被终止,并被称为Google自定义搜索的内容所代替.我查看了有关SO的其他相关问题,但没有找到我可以使用的任何东西.我有两个问题:

I just found out that this has been discontinued unfortunately and replaced by something called the google custom search. I looked at the other related questions on SO but didn't find anything I could use. I have two questions:

1)谷歌自定义搜索是否允许我准确地完成上述三行内容?

1) Does google custom search allow me to do exactly what I am doing in the three lines above?

2)如果是,我在哪里可以找到示例代码来完全执行上述操作?如果没有,那我可以使用pygoogle做些什么呢?

2) If yes - where can I find example code to do exactly what I am doing above? If no then what is the alternative to do what I did using pygoogle?

推荐答案

可以这样做.设置不是很简单,但是最终结果是您可以用几行代码从python搜索整个网络.

It is possible to do this. The setup is... not very straightforward, but the end result is that you can search the entire web from python with few lines of code.

共有3个主要步骤.

pygoogle 的页面状态:

不幸的是,Google不再支持SOAP API进行搜索,也不再支持 他们提供新的许可证密钥吗?简而言之,PyGoogle很漂亮 此时很多人死了.

Unfortunately, Google no longer supports the SOAP API for search, nor do they provide new license keys. In a nutshell, PyGoogle is pretty much dead at this point.

您可以改用他们的AJAX API.在这里查看示例代码: http://dcortesi.com/2008/05/28/google-ajax-search-api-example-python-code/

You can use their AJAX API instead. Take a look here for sample code: http://dcortesi.com/2008/05/28/google-ajax-search-api-example-python-code/

...但是您实际上也不能使用AJAX API.您必须获得一个Google API密钥. https://developers.google.com/api-client-library/python /guide/aaa_apikeys 为了进行简单的实验性使用,建议您使用服务器密钥".

... but you actually can't use AJAX API either. You have to get a Google API key. https://developers.google.com/api-client-library/python/guide/aaa_apikeys For simple experimental use I suggest "server key".

实际上,旧的API不可用.可用的最佳新API是自定义搜索".它似乎仅支持在特定域内进行搜索,但是,在按照此SO答案之后,您可以搜索整个网络:

Indeed, the old API is not available. The best new API that is available is Custom Search. It seems to support only searching within specific domains, however, after following this SO answer you can search the whole web:

  1. 从Google自定义搜索首页( http://www.google.com/cse/ ),点击创建自定义搜索引擎.
  2. 输入搜索引擎的名称和描述.
  3. 在定义搜索引擎"下的要搜索的网站"框中,至少输入一个有效的URL(目前,只需输入www.anyurl.com即可 经过此屏幕.稍后会对此进行详细介绍.)
  4. 选择所需的CSE版本并接受服务条款,然后单击下一步".选择所需的布局选项,然后单击 接下来.
  5. 单击下一步"部分下的任何链接,以导航到控制面板".
  6. 在左侧菜单的控制面板"下,单击基本".
  7. 在搜索首选项"部分中,选择搜索整个网络,但强调包含的网站".
  8. 点击保存更改.
  9. 在左侧菜单的控制面板"下,单击站点".
  10. 删除您在初始设置过程中输入的网站.
  1. From the Google Custom Search homepage ( http://www.google.com/cse/ ), click Create a Custom Search Engine.
  2. Type a name and description for your search engine.
  3. Under Define your search engine, in the Sites to Search box, enter at least one valid URL (For now, just put www.anyurl.com to get past this screen. More on this later ).
  4. Select the CSE edition you want and accept the Terms of Service, then click Next. Select the layout option you want, and then click Next.
  5. Click any of the links under the Next steps section to navigate to your Control panel.
  6. In the left-hand menu, under Control Panel, click Basics.
  7. In the Search Preferences section, select Search the entire web but emphasize included sites.
  8. Click Save Changes.
  9. In the left-hand menu, under Control Panel, click Sites.
  10. Delete the site you entered during the initial setup process.

Google也推荐使用这种方法: https://support.google.com/customsearch/answer/2631040

This approach is also recommended by Google: https://support.google.com/customsearch/answer/2631040

pip install google-api-python-client,更多信息在这里:

  • repo: https://github.com/google/google-api-python-client
  • more info: https://developers.google.com/api-client-library/python/apis/customsearch/v1
  • complete docs: https://api-python-client-doc.appspot.com/

因此,设置好之后,您可以从几个地方关注代码示例:

So, after setting this up, you can follow the code samples from few places:

cse()函数文档: https://google-api-client-libraries.appspot.com/documentation/customsearch/v1/python/latest/customsearch_v1.cse.html

最后得到这个:

from googleapiclient.discovery import build
import pprint

my_api_key = "Google API key"
my_cse_id = "Custom Search Engine ID"

def google_search(search_term, api_key, cse_id, **kwargs):
    service = build("customsearch", "v1", developerKey=api_key)
    res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
    return res['items']

results = google_search(
    'stackoverflow site:en.wikipedia.org', my_api_key, my_cse_id, num=10)
for result in results:
    pprint.pprint(result)

进行一些调整后,您可以编写一些功能与代码片段完全相同的功能,但是我将在此处跳过此步骤.

After some tweaking you could write some functions that behave exactly like your snippet, but I'll skip this step here.

这篇关于使用自定义搜索在Python中以编程方式搜索Google的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆