如何解析HTML表格Python和beautifulsoup并写入到CSV [英] How to parse html table with python and beautifulsoup and write to csv

查看:2156
本文介绍了如何解析HTML表格Python和beautifulsoup并写入到CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图解析HTML页和货币获取价值观和写入CSV。
我有以下code:

I try to parse html page and fetch values for currencies and write to csv. I have following code:

#!/usr/bin/env python

import urllib2
from BeautifulSoup import BeautifulSoup

contenturl = "http://www.bank.gov.ua/control/en/curmetal/detail/currency?period=daily"
soup = BeautifulSoup(urllib2.urlopen(contenturl).read())

table = soup.find('div', attrs={'class': 'content'})

rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        text = td.find(text=True) + ';'
        print text,
    print

问题是,我不知道,如何为货币只检索值。
我尝试了一些正则表达式,如^ [0-9] {3} - 开始与3位,但它不能正常工作

The problem is, that I do not know, how to retrieve only values for currency. I tried some regexp like '^[0-9]{3}' - start with 3 digits but it doesn't work.

推荐答案

您会好得多表中挑选出特定的细胞。在 D 细胞cell_c 类包含数据您有兴趣,最后一个总是货币汇率

You'd be much better off picking out specific cells in the table. The td cells with the cell_c class contain data you are interested in, and the last one is always the currency exchange rate:

rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    if 'cell_c' in cols[0]['class']:
        # currency row
        digital_code, letter_code, units, name, rate = [c.text for c in cols]
        print digital_code, letter_code, units, name, rate

随着独立变量的数据,现在可以把文本十进制数,将它们存储在数据库中,等等。

With the data in separate variables, you can now turn the text to decimal numbers, store them in a database, whatever.

这篇关于如何解析HTML表格Python和beautifulsoup并写入到CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆