使用BeautifulSoup表中提取数据 [英] Extracting data in table using BeautifulSoup

查看:261
本文介绍了使用BeautifulSoup表中提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刮了我的Andr​​oid应用此。我想提取物对城市和区域codeS

I'm scraping this page for my android app. I'd like to extract the data on the table of cities and area codes

下面是我的code:

from bs4 import BeautifulSoup
import urllib2
import re

base_url = "http://www.howtocallabroad.com/taiwan/"
html_page = urllib2.urlopen(base_url)
soup = BeautifulSoup(html_page)
codes = soup.select("#codes tbody > tr > td")
for area_code in codes:
    # print td city and area code

我想知道什么功能用Python或用 BeautifulSoup 从获取值< TD>价值< / TD>

对不起只是一个Android开发人员学习编写Python

Sorry just an android dev learning to write python

推荐答案

您可以使用的findAll(),连同它打破了一个列表分成块<函数/ p>

You can use findAll(), along with a function which breaks up a list into chunks

>>> areatable = soup.find('table',{'id':'codes'})
>>> d = {}
>>> def chunks(l, n):
...     return [l[i:i+n] for i in range(0, len(l), n)]
>>> dict(chunks([i.text for i in areatable.findAll('td')], 2))
{u'Chunan': u'36', u'Penghu': u'69', u'Wufeng': u'4', u'Fengyuan': u'4', u'Kaohsiung': u'7', u'Changhua': u'47', u'Pingtung': u'8', u'Keelung': u'2', u'Hsinying': u'66', u'Chungli': u'34', u'Suao': u'39', u'Yuanlin': u'48', u'Yungching': u'48', u'Panchiao': u'2', u'Taipei': u'2', u'Tainan': u'62', u'Peikang': u'5', u'Taichung': u'4', u'Yungho': u'2', u'Hsinchu': u'35', u'Tsoying': u'7', u'Hualien': u'38', u'Lukang': u'47', u'Talin': u'5', u'Chiaochi': u'39', u'Fengshan': u'7', u'Sanchung': u'2', u'Tungkang': u'88', u'Taoyuan': u'33', u'Hukou': u'36'}

说明:

.find()中找到与 $的C $ CS ID的表。使用功能块分裂列表进入<一个href=\"http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python\">evenly大小的块的。

Explanation:

.find() finds a table with an id of codes. The chunks function is used to split up a list into evenly sized chunks.

由于的findAll 返回一个列表,我们使用列表块创建类似:

As findAll returns a list, we use chunks on the list to create something like:

[[u'Changhua', u'47'], [u'Keelung', u'2'], etc]

i.text为我... 用于获取每个 D 标签的文本,否则在&LT; TD&GT; &LT; / TD&GT; 仍将

i.text for i in... is used to get the text of each td tag, otherwise the <td> and </td> would remain.

最后,字典()被称为列表的列表转换成一个字典,你可以用它来访问该国的区域code

Finally, dict() is called to convert the list of lists into a dictionary, which you can use to access the country's area code.

这篇关于使用BeautifulSoup表中提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆