使用 Tor + Privoxy 抓取谷歌购物结果:如何避免阻止? [英] Using Tor + Privoxy to scrape google shopping results: How to avoid block?

查看：73 发布时间：2021/7/16 21:43:50 python scrape tor

本文介绍了使用 Tor + Privoxy 抓取谷歌购物结果:如何避免阻止?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经在我的服务器上安装了 Tor + Privoxy，它们运行良好！(已测试).但是现在当我尝试使用 urllib2 (python) 来抓取谷歌购物结果时，当然使用代理，我总是被谷歌阻止(有时是 503 错误，有时是 403 错误).所以任何人都有任何解决方案可以帮助我避免这个问题?将不胜感激！

我使用的源代码:

I have installed Tor + Privoxy on my server and they're working fine! (Tested). But now when I try to use urllib2 (python) to scrape google shopping results, using proxy of course, I always get blocked by google (sometimes 503 error, sometimes 403 error). So anyone have any solutions can help me avoid that problem? It would be very appreciated!!

The source code that I am using:

 _HEADERS = {
      'User-Agent': 'Mozilla/5.0',
      'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
      'Accept-Encoding': 'deflate',
      'Connection': 'close',
      'DNT': '1'
  }

  request = urllib2.Request("https://www.google.com/#q=iphone+5&tbm=shop", headers=self._HEADERS)

  proxy_support = urllib2.ProxyHandler({"http" : "127.0.0.1:8118"})
  opener = urllib2.build_opener(proxy_support) 
  urllib2.install_opener(opener)

  try:
      response = urllib2.urlopen(request)
      html = response.read()
      print html

   except urllib2.HTTPError as e:
       print e.code
       print e.reason

注意:当我不使用代理时，它可以正常工作！

Note that: When I don't use proxy, it can work fine!

使用 Tor + Privoxy 抓取谷歌购物结果:如何避免阻止? [英] Using Tor + Privoxy to scrape google shopping results: How to avoid block?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用 Tor + Privoxy 抓取谷歌购物结果:如何避免阻止? [英] Using Tor + Privoxy to scrape google shopping results: How to avoid block?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭