beautifulsoup 4 +蟒蛇:String返回“无” [英] beautifulsoup 4 + python: string returns 'None'

查看:148
本文介绍了beautifulsoup 4 +蟒蛇:String返回“无”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图解析一些HTML与BeautifulSoup4和Python 2.7.6,但字符串返回无。我试图解析HTML是:

I'm trying to parse some html with BeautifulSoup4 and Python 2.7.6, but the string is returning "None". The HTML i'm trying to parse is:

<div class="booker-booking">
    2&nbsp;rooms
    &#0183;
    USD&nbsp;0
    <!-- Commission: USD  -->
</div>

从蟒蛇的片段我已经是:

The snippet from python I have is:

 data = soup.find('div', class_='booker-booking').string

我也尝试以下两种:

I've also tried the following two:

data = soup.find('div', class_='booker-booking').text
data = soup.find('div', class_='booker-booking').contents[0]

两者均返回:

u'\n\t\t2\xa0rooms \n\t\t\xb7\n\t\tUSD\xa00\n\t\t\n

我最终想要得到的第一行到一个变量只是说2房,而第三行到另一个变量只是说USD 0。

I'm ultimately trying to get the first line into a variable just saying "2 Rooms", and the third line into another variable just saying "USD 0".

推荐答案

.string 收益因为文本节点不是唯一的孩子(有一个评论)。

.string returns None because the text node is not the only child (there is a comment).

from bs4 import BeautifulSoup, Comment

soup = BeautifulSoup(html)
div = soup.find('div', 'booker-booking')
# remove comments
text = " ".join(div.find_all(text=lambda t: not isinstance(t, Comment)))
# -> u'\n    2\xa0rooms\n    \xb7\n    USD\xa00\n     \n'

要删除的Uni code空白:

To remove Unicode whitespace:

text = " ".join(text.split())
# -> u'2 rooms \xb7 USD 0'
print text
# -> 2 rooms · USD 0

要获得最终的变量:

var1, var2 = [s.strip() for s in text.split(u"\xb7")]
# -> u'2 rooms', u'USD 0'

这篇关于beautifulsoup 4 +蟒蛇:String返回“无”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆