Beautifulsoup 4:删除评论标签及其内容 [英] Beautifulsoup 4: Remove comment tag and its content

查看:139
本文介绍了Beautifulsoup 4:删除评论标签及其内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我要剪贴的页面包含这些html代码.如何删除注释标签<!-- -->及其包含 bs4 的内容?

So the page that I'm scrapping contains these html codes. How do I remove the comment tag <!-- --> along with its content with bs4?

<div class="foo">
cat dog sheep goat
<!-- 
<p>NewPP limit report
Preprocessor node count: 478/300000
Post‐expand include size: 4852/2097152 bytes
Template argument size: 870/2097152 bytes
Expensive parser function count: 2/100
ExtLoops count: 6/100
</p>
-->

</div>

推荐答案

您可以使用此答案):

PageElement.extract()从树中删除标签或字符串.它 返回提取的标签或字符串.

PageElement.extract() removes a tag or string from the tree. It returns the tag or string that was extracted.

from bs4 import BeautifulSoup, Comment

data = """<div class="foo">
cat dog sheep goat
<!--
<p>test</p>
-->
</div>"""

soup = BeautifulSoup(data)

div = soup.find('div', class_='foo')
for element in div(text=lambda text: isinstance(text, Comment)):
    element.extract()

print soup.prettify()

因此,您得到的div没有注释:

As a result you get your div without comments:

<div class="foo">
    cat dog sheep goat
</div>

这篇关于Beautifulsoup 4:删除评论标签及其内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆