BeautifulSoup HTML 表格解析 [英] BeautifulSoup HTML table parsing

查看：34 发布时间：2021/12/23 20:35:55 python beautifulsoup html-table html-parsing mechanize

本文介绍了BeautifulSoup HTML 表格解析的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试解析来自此站点的信息(html 表):http://www.511virginia.org/RoadConditions.aspx?j=All&r=1

I am trying to parse information (html tables) from this site: http://www.511virginia.org/RoadConditions.aspx?j=All&r=1

目前我正在使用 BeautifulSoup 并且我的代码看起来像这样

Currently I am using BeautifulSoup and the code I have looks like this

from mechanize import Browser
from BeautifulSoup import BeautifulSoup

mech = Browser()

url = "http://www.511virginia.org/RoadConditions.aspx?j=All&r=1"
page = mech.open(url)

html = page.read()
soup = BeautifulSoup(html)

table = soup.find("table")

rows = table.findAll('tr')[3]

cols = rows.findAll('td')

roadtype = cols[0].string
start = cols.[1].string
end = cols[2].string
condition = cols[3].string
reason = cols[4].string
update = cols[5].string

entry = (roadtype, start, end, condition, reason, update)

print entry

问题在于起始列和结束列.他们只是被打印为无"

The issue is with the start and end columns. They just get printed as "None"

输出:

(u'Rt. 613N (Giles County)', None, None, u'Moderate', u'snow or ice', u'01/13/2010 10:50 AM')

我知道它们被存储在列列表中，但似乎额外的链接标签混淆了原始 html 的解析，如下所示:

I know that they get stored in the columns list, but it seems that the extra link tag is messing up the parsing with the original html looking like this:

<td headers="road-type" class="ConditionsCellText">Rt. 613N (Giles County)</td>
<td headers="start" class="ConditionsCellText"><a href="conditions.aspx?lat=37.43036753&long=-80.51118005#viewmap">Big Stony Ck Rd; Rt. 635E/W (Giles County)</a></td>
<td headers="end" class="ConditionsCellText"><a href="conditions.aspx?lat=37.43036753&long=-80.51118005#viewmap">Cabin Ln; Rocky Mount Rd; Rt. 721E/W (Giles County)</a></td>
<td headers="condition" class="ConditionsCellText">Moderate</td>
<td headers="reason" class="ConditionsCellText">snow or ice</td>
<td headers="update" class="ConditionsCellText">01/13/2010 10:50 AM</td>

所以应该打印的是:

(u'Rt. 613N (Giles County)', u'Big Stony Ck Rd; Rt. 635E/W (Giles County)', u'Cabin Ln; Rocky Mount Rd; Rt. 721E/W (Giles County)', u'Moderate', u'snow or ice', u'01/13/2010 10:50 AM')

感谢任何建议或帮助，并在此先感谢您.

Any suggestions or help is appreciated, and thank you in advance.

BeautifulSoup HTML 表格解析 [英] BeautifulSoup HTML table parsing

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

BeautifulSoup HTML 表格解析 [英] BeautifulSoup HTML table parsing

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭