如何使用beautifulsoup在亚马逊网页上抓取产品详细信息 [英] how to scrape product details on amazon webpage using beautifulsoup

查看:325
本文介绍了如何使用beautifulsoup在亚马逊网页上抓取产品详细信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于网页: http://www.amazon. com/Harry-Potter-Prisoner-Azkaban-Rowling/dp/0439136369/ref = pd_sim_b_2?ie = UTF8& refRID = 1MFBRAECGPMVZC5MJCWG 我如何在python中抓取产品详细信息和输出字典. 在上述情况下,我想要的dict输出将是:

For webpage: http://www.amazon.com/Harry-Potter-Prisoner-Azkaban-Rowling/dp/0439136369/ref=pd_sim_b_2?ie=UTF8&refRID=1MFBRAECGPMVZC5MJCWG How could I scrape product details and output dict in python. In above case, the dict output I want to have will be:

Age Range: 9 - 12 years
Grade Level: 4 - 7
...
...

我是beautifulsoup的新手,也没有找到很好的例子来实现这一目标.我想举一些例子.

I'm new to beautifulsoup and didn't find good example to make this happen. I want to have some example to follow.

推荐答案

想法是在table#productDetailsTable div.content ul li 打印:

{
    u'Age Range': u'9 - 12 years',
    u'Amazon Best Sellers Rank': u'#1,440 in Books (',
    u'Average Customer Review': u'',
    u'Grade Level': u'4 - 7',
    u'ISBN-10': u'0439136369',
    u'ISBN-13': u'978-0439136365',
    u'Language': u'English',
    u'Lexile Measure': u'880L',
    u'Mass Market Paperback': u'448 pages',
    u'Product Dimensions': u'1.2 x 5.2 x 7.8 inches',
    u'Publisher': u'Scholastic Paperbacks (September 11, 2001)',
    u'Series': u'Harry Potter (Book 3)',
    u'Shipping Weight': u'11.2 ounces ('
}

请注意,一旦碰到AttributeError,我们就会中断循环.发生在li元素内没有更多粗体的文本之后.

Note that we are breaking the loop as soon as we hit an AttributeError. It happens on after there is no more bold text inside the li element.

这篇关于如何使用beautifulsoup在亚马逊网页上抓取产品详细信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆