批量搜索谷歌:403错误 [英] Batch searching on google : 403 error

查看:162
本文介绍了批量搜索谷歌:403错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



  

我试图进行批量搜索并查看字符串列表并打印Google搜索返回的第一个地址。 >#!/ usr / bin / python
导入json
导入urllib
导入时间
导入熊猫作为pd

df = pd.read_csv( test.csv)
saved_column = df.Name#您还可以在saved_column中使用df ['column_name']

作为名称:
query = urllib.urlencode({ q':name})
url ='http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s'%query
search_response = urllib.urlopen (url)
search_results = search_response.read()
results = json.loads(search_results)
data = results ['responseData']

address = data [ u'results'] [0] [u'url']

打印地址

我从服务器收到403错误:
'可疑服务条款滥用。请参阅 http://code.google.com/apis/errors ',u'responseStatus' :403



我是不是按照谷歌的服务条款允许的?



我也是试图在循环中放入time.sleep(5),但我得到了同样的错误。

预先感谢您

解决方案

Google TOS不允许。如果没有他们生气,你真的不能刮谷歌。它也是一个非常复杂的拦截器,所以你可以随时拖延一段时间,但它很快失败。



对不起,你运气不好这个。

I am trying to do batch searching and go over a list of strings and print the first address that google search returns:

#!/usr/bin/python
import json
import urllib
import time
import pandas as pd

df = pd.read_csv("test.csv")
saved_column = df.Name #you can also use df['column_name']

for name in saved_column:
  query = urllib.urlencode({'q': name})
  url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
  search_response = urllib.urlopen(url)
  search_results = search_response.read()
  results = json.loads(search_results)
  data = results['responseData']

  address = data[u'results'][0][u'url']

  print address

I get a 403 error from the server: 'Suspected Terms of Service Abuse. Please see http://code.google.com/apis/errors', u'responseStatus': 403

Is what I'm doing is not allowed according to google's terms of service?

I also tried to put time.sleep(5) in the loop but I get the same error.

Thank you in advance

解决方案

Not allowed by Google TOS. You really can't scrape google without them getting angry. It's also a pretty sophisticated blocker, so you can get around for a little while with random delays, but it fails pretty quickly.

Sorry, you're out of luck on this one.

这篇关于批量搜索谷歌:403错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆