有没有办法使用Python刮取Amazon Product Listing页面? [英] Is there a way to scrape Amazon Product Listing page using Python?

查看:104
本文介绍了有没有办法使用Python刮取Amazon Product Listing页面?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试抓取显示特定产品的供应商和价格的产品列表页面,但是urllib.urlopen无法正常工作-它可以在亚马逊上的所有其他页面上工作,但是我有点想知道是否亚马逊的机器人阻止了商品列表页面上的抓取。有人可以验证吗?使用Chrome,我仍然可以查看页面源代码...

I'm trying to scrape product listing pages that display the vendors and prices of particular products, but urllib.urlopen isn't working--it will work on all other pages on Amazon, but I'm kind of wondering if Amazon's bots prevent scraping on product listing pages. Can anyone verify this? Using Chrome I can still view page source...

以下是我要抓取的产品列表页面的示例: http://www.amazon.com/gp/offer-listing/B007E84H96/ ref = dp_olp_new?ie = UTF8& condition = new

Here's an example of a product listing page I would want to scrape: http://www.amazon.com/gp/offer-listing/B007E84H96/ref=dp_olp_new?ie=UTF8&condition=new

推荐答案

在该URL上尝试'curl -I'返回MethodNotAllowed:

Trying 'curl -I ' on that URL returns MethodNotAllowed:

$ curl -I 'http://www.amazon.com/gp/offer-listing/B007E84H96/ref=dp_olp_new?ie=UTF8&condition=new' 
HTTP/1.1 405 MethodNotAllowed
Date: Wed, 13 Feb 2013 16:41:08 GMT
Server: Server
x-amz-id-1: 1WKZG9N0SE87E3KFG6YV
allow: POST, GET
x-amz-id-2: Apluv2QBzzrmXlRWjlClRGsQQ1TbwsxObe2hxfdrGhO/OQziI/aIT3vkVjCPn+qz
Vary: Accept-Encoding,User-Agent
Content-Type: text/html; charset=ISO-8859-1

并使用'-A'开关添加User-Agent字符串不会影响该返回值。

and adding a User-Agent string with the '-A' switch didn't effect that return value.

您可以尝试使用不同的http标头,以查看是否可以找到通过的内容。但是,很明显,亚马逊不希望您从他们的产品页面上筛查价格
。稍加谷歌搜索就会显示此页面:

You might experiment with different http headers to see if you can find something that passess. But it's pretty obvious that Amazon wouldn't want you to screen scrape prices from their product pages. And a little googling brings up this page:

http://www.distil.it/amazon-cracks-down-on-price-scraping/#.URvBFo4ry0s


在没有大张旗鼓或没有警告的情况下,亚马逊从6月开始实施一项长期政策,禁止屏幕抓取工具直接从其市场中获取
列表信息,
第三方开发人员说,这是
为商家提供定价服务的提供商最喜欢的工具。

With no fanfare or warning, Amazon in June began enforcing a long-standing policy prohibiting screen-scraping tools from harvesting listing information directly from its marketplace, a favorite tool for providers of repricing services for merchants, according to a third-party developer.

还要注意,亚马逊为其会员有一个API -在右列的相关问题链接中,有一些有关从python使用该API的相关问题。

Note also that Amazon has an API for their affiliates -- there are some related questions about using that API from python in the "Related" question links on the right column.

这篇关于有没有办法使用Python刮取Amazon Product Listing页面?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆