检索特定产品的亚马逊评论 [英] Retrieve Amazon Reviews for a particular product

查看:104
本文介绍了检索特定产品的亚马逊评论的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在研究一个项目,该项目需要分析特定产品的评论并获得有关该产品的总体思路.

I'm currently working on a research project which need to analyze reviews of a particular product and get an overall idea about the product.

我听说亚马逊是获得产品评论/评论的好地方.有没有办法通过API从Amazon检索这些用户评论/评论?我尝试了几种python代码,但没有用.如果没有API可以检索数据,我是否需要编写蜘蛛程序?

I heard that amazon is a good place to get product reviews/comments. Is there any way to retrieve those user reviews/comments from Amazon via an API?? I tried several python codes but it doesn't work.. Do i need to write a spider if there is no API to retrieve data?

是否有任何方法/场所来检索给定产品的用户评论?

Are there any approaches/places to retrieve user reviews for a given product?

推荐答案

www.Scrapehero.com上有一个很好的教程,介绍如何抓取Amazon产品详细信息:

www.Scrapehero.com has a great tutorial on how to scrape Amazon product details: How To Scrape Amazon Product Details and Pricing using Python

他们使用的完整纯文本代码是...产品由其ASIN标识,因此请将数组值更改为您感兴趣的产品.

The complete plain text code they use is ... Products are identified by their ASIN so change the array values to the products you are interested in watching.

from lxml import html  
import csv,os,json
import requests
from exceptions import ValueError
from time import sleep

def AmzonParser(url):
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64)    AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
page = requests.get(url,headers=headers)
while True:
    sleep(3)
    try:
        doc = html.fromstring(page.content)
        XPATH_NAME = '//h1[@id="title"]//text()'
        XPATH_SALE_PRICE = '//span[contains(@id,"ourprice") or contains(@id,"saleprice")]/text()'
        XPATH_ORIGINAL_PRICE = '//td[contains(text(),"List Price") or contains(text(),"M.R.P") or contains(text(),"Price")]/following-sibling::td/text()'
        XPATH_CATEGORY = '//a[@class="a-link-normal a-color-tertiary"]//text()'
        XPATH_AVAILABILITY = '//div[@id="availability"]//text()'

        RAW_NAME = doc.xpath(XPATH_NAME)
        RAW_SALE_PRICE = doc.xpath(XPATH_SALE_PRICE)
        RAW_CATEGORY = doc.xpath(XPATH_CATEGORY)
        RAW_ORIGINAL_PRICE = doc.xpath(XPATH_ORIGINAL_PRICE)
        RAw_AVAILABILITY = doc.xpath(XPATH_AVAILABILITY)

        NAME = ' '.join(''.join(RAW_NAME).split()) if RAW_NAME else None
        SALE_PRICE = ' '.join(''.join(RAW_SALE_PRICE).split()).strip() if RAW_SALE_PRICE else None
        CATEGORY = ' > '.join([i.strip() for i in RAW_CATEGORY]) if RAW_CATEGORY else None
        ORIGINAL_PRICE = ''.join(RAW_ORIGINAL_PRICE).strip() if RAW_ORIGINAL_PRICE else None
        AVAILABILITY = ''.join(RAw_AVAILABILITY).strip() if RAw_AVAILABILITY else None

        if not ORIGINAL_PRICE:
            ORIGINAL_PRICE = SALE_PRICE

        if page.status_code!=200:
            raise ValueError('captha')
        data = {
                'NAME':NAME,
                'SALE_PRICE':SALE_PRICE,
                'CATEGORY':CATEGORY,
                'ORIGINAL_PRICE':ORIGINAL_PRICE,
                'AVAILABILITY':AVAILABILITY,
                'URL':url,
                }

        return data
    except Exception as e:
        print e

def ReadAsin():
# AsinList = csv.DictReader(open(os.path.join(os.path.dirname(__file__),"Asinfeed.csv")))
AsinList = ['B0046UR4F4',
'B00JGTVU5A',
'B00GJYCIVK',
'B00EPGK7CQ',
'B00EPGKA4G',
'B00YW5DLB4',
'B00KGD0628',
'B00O9A48N2',
'B00O9A4MEW',
'B00UZKG8QU',]
extracted_data = []
for i in AsinList:
    url = "http://www.amazon.com/dp/"+i
    print "Processing: "+url
    extracted_data.append(AmzonParser(url))
    sleep(5)
f=open('data.json','w')
json.dump(extracted_data,f,indent=4)


if __name__ == "__main__":
ReadAsin()

这篇关于检索特定产品的亚马逊评论的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆