如何从亚马逊产品页面中提取 asin [英] how to extract asin from an amazon product page

查看:87
本文介绍了如何从亚马逊产品页面中提取 asin的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下网页 产品页面我正在尝试从中获取 ASIN(在本例中为 ASIN=B014MHZ90M),但我不知道如何从页面获取它.

I have the following webpage Product page and I'm trying to get the ASIN from it (in this case ASIN=B014MHZ90M) and I don't have a clue on how to get it from the page.

我使用的是 Python 3.4、Scrapy 和以下代码:

I'm using Python 3.4, Scrapy and the following code:

hxs = Selector(response)
product_name = "".join(hxs.xpath('//span[contains(@class,"a-text-ellipsis")]/a/text()').extract())
product_model = hxs.xpath('//body//div[@id="buybox_feature_div"]//form[@method="post"]/input[@id="ASIN"/text()').extract()

这样我就没有得到必填字段(ASIN 编号).
1.如何获取商品型号(ASIN)?

In this way I don't get the required field (the ASIN number).
1. What should I do in order to get the product model (ASIN)?

2.有没有办法调试这样的代码(我使用的是 PyCharm).我无法使用调试器,只能运行它而看不到慢动作"中发生了什么.

2.Is there a way to debug such code (I'm using PyCharm). I could not use debugger but only run it without seeing what's going on there in 'slow motion'.

提前谢谢大家.

推荐答案

查看您链接的亚马逊页面,ASIN 编号出现在产品详细信息"部分.使用scrapy shell 下面的xpath

Looking at the Amazon page you linked, the ASIN number appears in the "Product Details" section. Using the scrapy shell the following xpath

response.xpath('//li[contains(.,"ASIN: ")]//text()').extract()

返回

[u'ASIN: ', u'B014MHZ90M']

为了调试 XPATH,我总是使用 scrapy shell 和 Firebug for Firefox.

For debugging XPATHs I always use scrapy shell and Firebug for Firefox.

这篇关于如何从亚马逊产品页面中提取 asin的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆