使用beautifulsoup python测试没有内容的标签 [英] testing for tags with no content with beautifulsoup python

查看:154
本文介绍了使用beautifulsoup python测试没有内容的标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python中的BeautifulSoup抓取网页.问题下的html如下所示:

I m working with BeautifulSoup in Python for scraping a webpage. The html under issue looks like below:

<td><a href="blah.html>blahblah</a></td>
<td>line2</td>
<td></td>

我希望获取td标签的内容.因此,对于第一个td,我需要"blahblah"文本,对于下一个td,我要写入"line2",而对于最后一个td,则需要"blank",因为没有内容.

i wish to take the contents of the td tag. So for the first td, i need the "blahblah" text and for the next td, i want to write "line2" and for the last td, "blank" because there is no content.

我的代码段如下所示-

row = [] 
for each_td in td:                        
    link = each_td.find_all('a')                                                
    if link:
        row.append(link[0].contents[0])
        row.append(link[0]['href'])
    elif each_td.contents[0] is None:
        row.append('blank')                
    else:
        row.append(each_td.contents[0])
print row

但是在运行时,出现错误-

However on running, i get the error -

elif each_td.contents[0] is None:
IndexError: list index out of range

注意-我正在与beautifulsoup合作.

Note- i am working with beautifulsoup.

我如何测试"no-content-td"并适当地进行搜索?为什么"...无"不起作用?

How do I test for the "no-content-td" and weite appropriately? Why is the "... is None" not working?

推荐答案

谁说内容"始终至少包含一个元素?显然,您遇到的情况是内容"没有任何元素,因此会出现此错误.

Who said that 'contents' has always at least one element? Obviously you encounter the situation that 'contents' has no elements and therefore you will this error.

更合适的检查是:

if each_td.contents:

if len(each_td.contents) > 0:

但是您的假设是错误的.

But your preassumption is just wrong.

这篇关于使用beautifulsoup python测试没有内容的标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆