使用BeautifulSoup表中提取数据 [英] Extracting data in table using BeautifulSoup
问题描述
我刮了我的Android应用此页。我想提取物对城市和区域codeS
I'm scraping this page for my android app. I'd like to extract the data on the table of cities and area codes
下面是我的code:
from bs4 import BeautifulSoup
import urllib2
import re
base_url = "http://www.howtocallabroad.com/taiwan/"
html_page = urllib2.urlopen(base_url)
soup = BeautifulSoup(html_page)
codes = soup.select("#codes tbody > tr > td")
for area_code in codes:
# print td city and area code
我想知道什么功能用Python或用 BeautifulSoup
从获取值< TD>价值< / TD>
对不起只是一个Android开发人员学习编写Python
Sorry just an android dev learning to write python
推荐答案
您可以使用的findAll()
,连同它打破了一个列表分成块<函数/ p>
You can use findAll()
, along with a function which breaks up a list into chunks
>>> areatable = soup.find('table',{'id':'codes'})
>>> d = {}
>>> def chunks(l, n):
... return [l[i:i+n] for i in range(0, len(l), n)]
>>> dict(chunks([i.text for i in areatable.findAll('td')], 2))
{u'Chunan': u'36', u'Penghu': u'69', u'Wufeng': u'4', u'Fengyuan': u'4', u'Kaohsiung': u'7', u'Changhua': u'47', u'Pingtung': u'8', u'Keelung': u'2', u'Hsinying': u'66', u'Chungli': u'34', u'Suao': u'39', u'Yuanlin': u'48', u'Yungching': u'48', u'Panchiao': u'2', u'Taipei': u'2', u'Tainan': u'62', u'Peikang': u'5', u'Taichung': u'4', u'Yungho': u'2', u'Hsinchu': u'35', u'Tsoying': u'7', u'Hualien': u'38', u'Lukang': u'47', u'Talin': u'5', u'Chiaochi': u'39', u'Fengshan': u'7', u'Sanchung': u'2', u'Tungkang': u'88', u'Taoyuan': u'33', u'Hukou': u'36'}
说明:
.find()
中找到与 $的C $ CS
ID的表。使用功能块分裂列表进入<一个href=\"http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python\">evenly大小的块的。
Explanation:
.find()
finds a table with an id of codes
. The chunks
function is used to split up a list into evenly sized chunks.
由于的findAll
返回一个列表,我们使用列表块创建类似:
As findAll
returns a list, we use chunks on the list to create something like:
[[u'Changhua', u'47'], [u'Keelung', u'2'], etc]
i.text为我...
用于获取每个 D
标签的文本,否则在&LT; TD&GT;
和&LT; / TD&GT;
仍将
i.text for i in...
is used to get the text of each td
tag, otherwise the <td>
and </td>
would remain.
最后,字典()
被称为列表的列表转换成一个字典,你可以用它来访问该国的区域code
Finally, dict()
is called to convert the list of lists into a dictionary, which you can use to access the country's area code.
这篇关于使用BeautifulSoup表中提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!