如何在获取 Beautiful Soup 元素的 .string 时忽略标签? [英] How do I ignore tags while getting the .string of a Beautiful Soup element?
本文介绍了如何在获取 Beautiful Soup 元素的 .string 时忽略标签?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在处理具有子标签的 HTML 元素,我想忽略"或删除这些子标签,以便文本仍然存在.刚才,如果我尝试 .string
任何带有标签的元素,我得到的只是 None
.
导入 bs4汤 = bs4.BeautifulSoup("""<div id="main"><p>这是一个段落.</p><p>这是一个带有标签</span>的段落<span class="test">.</p><p>这是另一段.</p>
""")main = 汤.find(id='main')对于 main.children 中的孩子:打印 child.string
输出:
这是一个段落.没有任何这是另一段.
我希望第二行是 这是一个带有标签的段落.
.我该怎么做?
解决方案
for child in soup.find(id='main'):if isinstance(child, bs4.Tag):打印 child.text
而且,你会得到:
这是一个段落.这是一个带有标签的段落.这是另一段.
I'm working with HTML elements that have child tags, which I want to "ignore" or remove, so that the text is still there. Just now, if I try to .string
any element with tags, all I get is None
.
import bs4
soup = bs4.BeautifulSoup("""
<div id="main">
<p>This is a paragraph.</p>
<p>This is a paragraph <span class="test">with a tag</span>.</p>
<p>This is another paragraph.</p>
</div>
""")
main = soup.find(id='main')
for child in main.children:
print child.string
Output:
This is a paragraph.
None
This is another paragraph.
I want the second line to be This is a paragraph with a tag.
. How do I do this?
解决方案
for child in soup.find(id='main'):
if isinstance(child, bs4.Tag):
print child.text
And, you'll get:
This is a paragraph.
This is a paragraph with a tag.
This is another paragraph.
这篇关于如何在获取 Beautiful Soup 元素的 .string 时忽略标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文