Python的网页抓取;美丽的汤 [英] Python Web Scraping; Beautiful Soup

查看:185
本文介绍了Python的网页抓取;美丽的汤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

本覆盖在这个职位:<一href=\"http://stackoverflow.com/questions/1391657/python-web-scraping-involving-html-tags-with-attributes\">Python网页抓取涉及与属性HTML代码

但我一直没能做到这个网页类似的东西:的 http://www.expatistan.com/cost-of-living/comparison/melbourne/auckland

But I haven't been able to do something similar for this web page: http://www.expatistan.com/cost-of-living/comparison/melbourne/auckland?

我想刮的值:

  <td class="price city-2">
                                                      NZ$15.62
                                      <span style="white-space:nowrap;">(AU$12.10)</span>
                                                  </td>
  <td class="price city-1">
                                                      AU$15.82
                              </td>

基本上价格城市-2和价格城市-1(NZ $ 15.62和HK $ 15.82)

Basically price city-2 and price city-1 (NZ$15.62 and AU$15.82)

目前有:

import urllib2

from BeautifulSoup import BeautifulSoup

url = "http://www.expatistan.com/cost-of-living/comparison/melbourne/auckland?"
page = urllib2.urlopen(url)

soup = BeautifulSoup(page)

price2 = soup.findAll('td', attrs = {'class':'price city-2'})
price1 = soup.findAll('td', attrs = {'class':'price city-1'})

for price in price2:
    print price

for price in price1:
    print price

在理想情况下,我也喜欢有逗号分隔值:

Ideally, I'd also like to have comma separated values for:

<th colspan="3" class="clickable">Food</th>, 

提取食物,

<td class="item-name">Daily menu in the business district</td>

提取'在商业区每日菜单

Extracting 'Daily menu in the business district'

,然后价格城市-2和价格city1值

and then the values for price city-2, and price-city1

因此​​,打印输出将是:

So the printout would be:

食品,在商业区每日菜单,NZ $ 15.62,AU $ 15.82

Food, Daily menu in the business district, NZ$15.62, AU$15.82

谢谢!

推荐答案

我觉得BeautifulSoup难以使用。这是基于一个版本的 webscraping模块

I find BeautifulSoup awkward to use. Here is a version based on the webscraping module:

from webscraping import common, download, xpath

# download html
D = download.Download()
html = D.get('http://www.expatistan.com/cost-of-living/comparison/melbourne/auckland')

# extract data
items = xpath.search(html, '//td[@class="item-name"]')
city1_prices = xpath.search(html, '//td[@class="price city-1"]')
city2_prices = xpath.search(html, '//td[@class="price city-2"]')

# display and format
for item, city1_price, city2_price in zip(items, city1_prices, city2_prices):
    print item.strip(), city1_price.strip(), common.remove_tags(city2_price, False).strip()

输出:

在商业区AU每日菜单$ 15.82 NZ $ 15.62

Daily menu in the business district AU$15.82 NZ$15.62

组合一顿快餐店(巨无霸餐或类似)AU $ 7.40 NZ $ 8.16

Combo meal in fast food restaurant (Big Mac Meal or similar) AU$7.40 NZ$8.16

1/2公斤鸡胸脯AU $ 6.07 NZ $ 10.25(1磅)

1/2 Kg (1 lb.) of chicken breast AU$6.07 NZ$10.25

...

这篇关于Python的网页抓取;美丽的汤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆