Python urllib2.HTTPError:HTTP 错误 503:有效网站上的服务不可用 [英] Python urllib2.HTTPError: HTTP Error 503: Service Unavailable on valid website

查看:27
本文介绍了Python urllib2.HTTPError:HTTP 错误 503:有效网站上的服务不可用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用亚马逊的产品广告 API 来生成包含给定书籍价格的网址.我生成的一个网址如下:

I have been using Amazon's Product Advertising API to generate urls that contains prices for a given book. One url that I have generated is the following:

http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%Damaztest04-20%Damaztest04-20%D26link2%2Creative%2Cream%D20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%230376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K763273D0415376327

当我点击链接或将链接粘贴到地址栏时,网页加载正常.但是,当我执行以下代码时,出现错误:

When I click on the link or paste the link on the address bar, the web page loads fine. However, when I execute the following code I get an error:

url = "http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327"
html_contents = urllib2.urlopen(url)

错误是urllib2.HTTPError:HTTP 错误 503:服务不可用.首先,我不明白为什么我什至收到这个错误,因为网页成功加载.

The error is urllib2.HTTPError: HTTP Error 503: Service Unavailable. First of all, I don't understand why I even get this error since the web page successfully loads.

另外,我注意到的另一个奇怪的行为是以下代码有时会出现有时不会给出所述错误:

Also, another weird behavior that I have noticed is that the following code sometimes does and sometimes does not give the stated error:

html_contents = urllib2.urlopen("http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327")

我完全不知道这种行为是如何发生的.是否有任何修复或解决方法?我的目标是读取 url 的 html 内容.

I am totally lost on how this behavior occurs. Is there any fix or work around to this? My goal is to read the html contents of the url.

编辑

我不知道为什么堆栈溢出会更改我的代码以将我上面在代码中列出的亚马逊链接更改为 rads.stackoverflow.无论如何,忽略 rads.stackoverflow 链接并在引号之间使用我上面的链接.

I don't know why stack overflow is changing my code to change the amazon link I listed above in my code to rads.stackoverflow. Anyway, ignore the rads.stackoverflow link and use my link above between the quotes.

推荐答案

这是因为亚马逊不允许自动访问他们的数据,所以他们拒绝了您的请求,因为它不是来自正确的浏览器.如果您查看 503 响应的内容,它会说:

It's because Amazon don't allow automated access to their data, so they're rejecting your request because it didn't come from a proper browser. If you look at the content of the 503 response, it says:

要讨论自动访问亚马逊数据,请联系api-services-support@amazon.com.有关迁移到我们的 API 的信息,请参阅我们的市场 API,位于 https://developer.amazonservices.com/ref=rm_5_sv,或我们的产品广告 API,网址为https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html/ref=rm_5_ac用于广告用例.

To discuss automated access to Amazon data please contact api-services-support@amazon.com. For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.com/ref=rm_5_sv, or our Product Advertising API at https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html/ref=rm_5_ac for advertising use cases.

这是因为 Python 的 urllibUser-Agent 显然不是浏览器.您总是可以伪造 User-Agent,但这并不是真正好的(或道德的)做法.

This is because the User-Agent for Python's urllib is so obviously not a browser. You could always fake the User-Agent, but that's not really good (or moral) practice.

附带说明,正如在另一个答案中提到的,requests 库非常适合 Python 中的 HTTP 访问.

As a side note, as mentioned in another answer, the requests library is really good for HTTP access in Python.

这篇关于Python urllib2.HTTPError:HTTP 错误 503:有效网站上的服务不可用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆