Python的网页抓取;美丽的汤 [英] Python Web Scraping; Beautiful Soup

查看：185 发布时间：2016/8/5 19:18:23 python screen-scraping beautifulsoup

本文介绍了Python的网页抓取;美丽的汤的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

本覆盖在这个职位：<一href=\"http://stackoverflow.com/questions/1391657/python-web-scraping-involving-html-tags-with-attributes\">Python网页抓取涉及与属性HTML代码

但我一直没能做到这个网页类似的东西：的 http://www.expatistan.com/cost-of-living/comparison/melbourne/auckland ？

But I haven't been able to do something similar for this web page: http://www.expatistan.com/cost-of-living/comparison/melbourne/auckland?

我想刮的值：

  <td class="price city-2">
                                                      NZ$15.62
                                      <span style="white-space:nowrap;">(AU$12.10)</span>
                                                  </td>
  <td class="price city-1">
                                                      AU$15.82
                              </td>

基本上价格城市-2和价格城市-1（NZ $ 15.62和HK $ 15.82）

Basically price city-2 and price city-1 (NZ$15.62 and AU$15.82)

目前有：

import urllib2

from BeautifulSoup import BeautifulSoup

url = "http://www.expatistan.com/cost-of-living/comparison/melbourne/auckland?"
page = urllib2.urlopen(url)

soup = BeautifulSoup(page)

price2 = soup.findAll('td', attrs = {'class':'price city-2'})
price1 = soup.findAll('td', attrs = {'class':'price city-1'})

for price in price2:
    print price

for price in price1:
    print price

在理想情况下，我也喜欢有逗号分隔值：

Ideally, I'd also like to have comma separated values for:

<th colspan="3" class="clickable">Food</th>,

提取食物，

<td class="item-name">Daily menu in the business district</td>

提取'在商业区每日菜单

Extracting 'Daily menu in the business district'

，然后价格城市-2和价格city1值

and then the values for price city-2, and price-city1

因此，打印输出将是：

So the printout would be:

食品，在商业区每日菜单，NZ $ 15.62，AU $ 15.82

Food, Daily menu in the business district, NZ$15.62, AU$15.82

谢谢！

推荐答案

我觉得BeautifulSoup难以使用。这是基于一个版本的 webscraping模块：

I find BeautifulSoup awkward to use. Here is a version based on the webscraping module:

from webscraping import common, download, xpath

# download html
D = download.Download()
html = D.get('http://www.expatistan.com/cost-of-living/comparison/melbourne/auckland')

# extract data
items = xpath.search(html, '//td[@class="item-name"]')
city1_prices = xpath.search(html, '//td[@class="price city-1"]')
city2_prices = xpath.search(html, '//td[@class="price city-2"]')

# display and format
for item, city1_price, city2_price in zip(items, city1_prices, city2_prices):
    print item.strip(), city1_price.strip(), common.remove_tags(city2_price, False).strip()

输出：

在商业区AU每日菜单$ 15.82 NZ $ 15.62

Daily menu in the business district AU$15.82 NZ$15.62

组合一顿快餐店（巨无霸餐或类似）AU $ 7.40 NZ $ 8.16

Combo meal in fast food restaurant (Big Mac Meal or similar) AU$7.40 NZ$8.16

1/2公斤鸡胸脯AU $ 6.07 NZ $ 10.25（1磅）

1/2 Kg (1 lb.) of chicken breast AU$6.07 NZ$10.25

...

这篇关于Python的网页抓取;美丽的汤的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python的网页抓取;美丽的汤 [英] Python Web Scraping; Beautiful Soup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python的网页抓取;美丽的汤 [英] Python Web Scraping; Beautiful Soup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭