美汤4:删除评论标签及其内容 [英] Beautiful Soup 4: Remove comment tag and its content

查看：22 发布时间：2021/12/23 20:42:05 python html web-scraping html-parsing beautifulsoup

本文介绍了美汤4:删除评论标签及其内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在抓取的页面包含这些 HTML 代码.如何使用 bs4 删除注释标签  及其内容?


猫狗绵羊山羊<!--<p>NewPP 限制报告预处理器节点数:478/300000后扩展包括大小:4852/2097152 字节模板参数大小:870/2097152 字节昂贵的解析器函数计数:2/100ExtLoops 计数:6/100</p>-->

解决方案

您可以使用 extract()(解决方案基于这个答案):

<块引用>

PageElement.extract() 从树中删除标签或字符串.它返回提取的标签或字符串.

from bs4 import BeautifulSoup, Comment数据 = """猫狗绵羊山羊<!--<p>测试</p>--></div>"""汤 = BeautifulSoup(数据)div = 汤.find('div', class_='foo')对于 div(text=lambda text: isinstance(text, Comment)) 中的元素:element.extract()打印汤.美化()

结果你得到你的 div 没有评论:

猫狗绵羊山羊

The page that I'm scraping contains these HTML codes. How do I remove the comment tag  along with its content with bs4?

<div class="foo">
cat dog sheep goat
<!-- 
<p>NewPP limit report
Preprocessor node count: 478/300000
Post‐expand include size: 4852/2097152 bytes
Template argument size: 870/2097152 bytes
Expensive parser function count: 2/100
ExtLoops count: 6/100
</p>
-->
</div>

解决方案

You can use extract() (solution is based on this answer):

PageElement.extract() removes a tag or string from the tree. It returns the tag or string that was extracted.

from bs4 import BeautifulSoup, Comment

data = """<div class="foo">
cat dog sheep goat
<!--
<p>test</p>
-->
</div>"""

soup = BeautifulSoup(data)

div = soup.find('div', class_='foo')
for element in div(text=lambda text: isinstance(text, Comment)):
    element.extract()

print soup.prettify()

As a result you get your div without comments:

<div class="foo">
    cat dog sheep goat
</div>

这篇关于美汤4:删除评论标签及其内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

美汤4:删除评论标签及其内容 [英] Beautiful Soup 4: Remove comment tag and its content

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

美汤4:删除评论标签及其内容 [英] Beautiful Soup 4: Remove comment tag and its content

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭