使用谷歌自定义搜索 api 下载图像 [英] download images with google custom search api

查看:31
本文介绍了使用谷歌自定义搜索 api 下载图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在python中使用google image api下载了第20张图片结果,代码如下:

导入操作系统导入系统导入时间从 urllib 导入 FancyURLopener导入 urllib2导入 simplejsonsearchTerm = "猫"# 替换 '%20' 的搜索词中的空格 ' ' 以符合请求searchTerm = searchTerm.replace(' ','%20')# 使用定义的版本启动 FancyURLopener类 MyOpener(FancyURLopener):version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'myopener = MyOpener()# 设置计数为0计数=0对于范围(0,4)中的我:# 请注意,每次迭代的开始都会发生变化,以便为每个循环请求一组新的图像url = ('https://ajax.googleapis.com/ajax/services/search/images?'+'v=1.0&q='+searchTerm7+'&start='+str(i*4)+'&userip=MyIP&imgsz=xlarge|xxlarge|huge')打印网址request = urllib2.Request(url, None, {'Referer': 'testing'})响应 = urllib2.urlopen(请求)# 使用 JSON 获取结果结果 = simplejson.load(响应)数据 = 结果['responseData']数据信息 = 数据['结果']# 迭代每个结果并获得未转义的 url对于 dataInfo 中的 myUrl:计数 = 计数 + 1打印 myUrl['unescapedUrl']os.chdir(新路径)myopener.retrieve(myUrl['unescapedUrl'],str(num)+'-'+str(count))# 休眠一秒以防止 Google 阻止 IP时间.sleep(3)

但现在我想使用谷歌自定义搜索来做到这一点,以获得更好的结果.我知道我应该注册以获得 APIKey,但我没有找到任何简单的例子作为我发布的代码.有人可以帮忙吗,我真的迷失在谷歌文档中.

显然对免费 api 有限制,每天 100 个请求,对吗?

我现在在这里,但仍然无法工作

导入操作系统导入系统导入时间从 urllib 导入 FancyURLopener导入 urllib2导入 simplejson导入 cStringIO导入打印searchTerm="猫"# 使用定义的版本启动 FancyURLopener类 MyOpener(FancyURLopener):version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'myopener = MyOpener()url='https://www.googleapis.com/customsearch/v1?key=API_KEY&cx=017576662512468239146:omuauf_lfve'+'&q='+searchTerm+'&searchType=image'+'&start=0'+'&imgSize=xlarge|xxlarge|巨大'打印网址request = urllib2.Request(url, None, {'Referer': 'testing'})响应 = urllib2.urlopen(请求)# 使用 JSON 获取结果数据 = json.load(响应)pprint.PrettyPrinter(indent=4).pprint(data['items'][0])

解决方案

你可以使用这个 Google API 客户端 Python 库.

演示:

此处 是一个示例(我将其更改为):

from apiclient.discovery 导入构建service = build("customsearch", "v1",developerKey="** 您的开发人员密钥 **")res = service.cse().list(q='蝴蝶',cx=' ** 你的 cx **',搜索类型='图像',数量=3,imgType='剪贴画',文件类型='png',安全 = '关闭').执行()如果不是 res 中的项目":打印 '没有结果 !!
res 是:{}'.format(res)别的:对于 res['items'] 中的项目:打印('{}:
	{}'.format(item['title'], item['link']))

输出:

剪贴画 - 蝴蝶:http://openclipart.org/image/800px/svg_to_png/3965/jonata_Butterfly.png动物、蝴蝶、昆虫、自然 - 免费图片 - 158831:http://pixabay.com/static/uploads/photo/2013/07/13/11/51/animal-158831_640.png剪贴画 - 帝王蝶:http://openclipart.org/image/800px/svg_to_png/110023/Monarch_Butterfly_by_Merlin2525.png

是的,免费版本有限制,您可以,然后选择您的自定义搜索引擎,然后在 基础 标签,将 Image search 选项设置为 ON,对于 Sites to search 部分,选择 Search the entire web but强调包含的网站 选项.

链接:

I have used google image api in python to download 20 first image result with the following code:

import os
import sys
import time
from urllib import FancyURLopener
import urllib2
import simplejson



searchTerm = "Cat"

# Replace spaces ' ' in search term for '%20' in order to comply with request
searchTerm = searchTerm.replace(' ','%20')



# Start FancyURLopener with defined version 
class MyOpener(FancyURLopener): 
    version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
myopener = MyOpener()

# Set count to 0
count=0

for i in range(0,4):
    # Notice that the start changes for each iteration in order to request a new set of images for each loop
  url = ('https://ajax.googleapis.com/ajax/services/search/images?'+'v=1.0&q='+searchTerm7+'&start='+str(i*4)+'&userip=MyIP&imgsz=xlarge|xxlarge|huge')
  print url
  request = urllib2.Request(url, None, {'Referer': 'testing'})
  response = urllib2.urlopen(request)

    # Get results using JSON
  results = simplejson.load(response)
  data = results['responseData']
  dataInfo = data['results']

    # Iterate for each result and get unescaped url
  for myUrl in dataInfo:
    count = count + 1
    print myUrl['unescapedUrl']
    os.chdir(newpath)
    myopener.retrieve(myUrl['unescapedUrl'],str(num)+'-'+str(count))

    # Sleep for one second to prevent IP blocking from Google
    time.sleep(3)

But now i would like to use google custom search to do that, in order to get better result. I have understand that i should register to get a APIKey but i did'nt find any simple example as the code i post. Does some one can help, i am really lost in the google documentation.

Visibly there is restriction for the free api, 100 request a day, is that correct?

Edit: I am here rightnow, but still not work

import os
import sys
import time
from urllib import FancyURLopener
import urllib2
import simplejson
import cStringIO
import pprint


searchTerm="Cat"

# Start FancyURLopener with defined version 
class MyOpener(FancyURLopener): 
    version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
myopener = MyOpener()


url='https://www.googleapis.com/customsearch/v1?key=API_KEY&cx=017576662512468239146:omuauf_lfve'+'&q='+searchTerm+'&searchType=image'+'&start=0'+'&imgSize=xlarge|xxlarge|huge'
print url
request = urllib2.Request(url, None, {'Referer': 'testing'})
response = urllib2.urlopen(request)

    # Get results using JSON

data = json.load(response)
pprint.PrettyPrinter(indent=4).pprint(data['items'][0]) 

解决方案

You can use this Google APIs Client Library for Python.

Demo:

Here is a sample (i change it to):

from apiclient.discovery import build

service = build("customsearch", "v1",
               developerKey="** your developer key **")

res = service.cse().list(
    q='butterfly',
    cx=' ** your cx **',
    searchType='image',
    num=3,
    imgType='clipart',
    fileType='png',
    safe= 'off'
).execute()

if not 'items' in res:
    print 'No result !!
res is: {}'.format(res)
else:
    for item in res['items']:
        print('{}:
	{}'.format(item['title'], item['link']))

Output:

Clipart - Butterfly:
        http://openclipart.org/image/800px/svg_to_png/3965/jonata_Butterfly.png
Animal, Butterfly, Insect, Nature - Free image - 158831:
        http://pixabay.com/static/uploads/photo/2013/07/13/11/51/animal-158831_640.png
Clipart - Monarch Butterfly:
        http://openclipart.org/image/800px/svg_to_png/110023/Monarch_Butterfly_by_Merlin2525.png

Yes, there is a limitation for Free edition and you can monitor it from Google developer console:

Note:

Go to your Custom Search Engine, then select your custom search engine, then in Basics tab, set Image search option to ON, and for Sites to search section, select Search the entire web but emphasize included site option.

Links:

这篇关于使用谷歌自定义搜索 api 下载图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆