为什么我不能通过BeautifulSoup刮擦亚马逊？ [英] Why can't I scrape Amazon by BeautifulSoup?

查看：111 发布时间：2020/6/3 22:52:28 python beautifulsoup amazon

本文介绍了为什么我不能通过BeautifulSoup刮擦亚马逊？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我的python代码：

Here is my python code:

import urllib2
from bs4 import BeautifulSoup

page = urllib2.urlopen("http://www.amazon.com/")
soup = BeautifulSoup(page)
print soup

它适用于google.com和许多其他网站，但不适用于amazon.com。

it works for google.com and many other websites, but it doesn't work for amazon.com.

我可以在浏览器中打开amazon.com，但是结果汤仍然没有。

I can open amazon.com in my browser, but the resulting "soup" is still none.

此外，我发现它也无法从appannie.com抓取。但是，代码没有返回任何错误，而是返回了一个错误：

Besides, I find that it cannot scrape from appannie.com, either. However, rather than give none, the code returns an error:

HTTPError: HTTP Error 503: Service Temporarily Unavailable

所以我怀疑Amazon和App Annie是否阻止抓取。

So I doubt whether Amazon and App Annie block scraping.

请您自己尝试，而不是直接投票给问题：（

Please do try by yourself instead of just voting down to the question :(

谢谢

推荐答案

添加标题，然后它将起作用。

Add a header, then it will work.

from bs4 import BeautifulSoup
import requests
url = "http://www.amazon.com/"

# add header
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'}
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, "lxml")
print soup

这篇关于为什么我不能通过BeautifulSoup刮擦亚马逊？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么我不能通过BeautifulSoup刮擦亚马逊？ [英] Why can't I scrape Amazon by BeautifulSoup?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么我不能通过BeautifulSoup刮擦亚马逊？ [英] Why can&#39;t I scrape Amazon by BeautifulSoup?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

为什么我不能通过BeautifulSoup刮擦亚马逊？ [英] Why can't I scrape Amazon by BeautifulSoup?

登录关闭