如何使用Python美丽的汤只得到1级navigableText? [英] How to use python beautiful soup to get only the level 1 navigableText?

查看:164
本文介绍了如何使用Python美丽的汤只得到1级navigableText?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是用美丽的汤从这个例子HTML code中的文本:

I am using beautiful soup to get the text from this example html code:

....
<div style="s1">
    <div style="s2">Here is text 1</div>
    <div style="s3">Here is text 2</div>
Here is text 3 and this is what I want.
</div>
....

文本1和文本2是在同一水平2和3文本在上一级1.我只想要得到的文本3和使用这样的:

Text 1 and text 2 is at the same level 2 and the text 3 is at the upper level 1. I only want to get the text 3 and used this:

for anchor in tbody.findAll('div', style="s1"):
    review=anchor.text
    print review

但这些code让我所有的文字1,2,3。我怎么只得到了第一级的文本3?

But these code get me all the text 1,2,3. How do I only get the first level text 3?

推荐答案

是这样的:

for anchor in tbody.findAll('div', style="s1"):
    text = ''.join([x for x in anchor.contents if isinstance(x, bs4.element.NavigableString)])

工作。只要知道你还可以在那里得到的换行符,所以 .strip()荷兰国际集团可能是必要的。

works. Just know that you'll also get the line breaks in there, so .strip()ing might be necessary.

例如:

for anchor in tbody.findAll('div', style="s1"):
    text = ''.join([x for x in anchor.contents if isinstance(x, bs4.element.NavigableString)])
    print([text])
    print([text.strip()])

打印

[u'\n\n\nHere is text 3 and this is what I want.\n']
[u'Here is text 3 and this is what I want.']

(我把它们放在列表,所以你可以看到新行)。

(I put them in lists so you could see the newlines.)

这篇关于如何使用Python美丽的汤只得到1级navigableText?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆