美汤4:删除评论标签及其内容 [英] Beautiful Soup 4: Remove comment tag and its content
本文介绍了美汤4:删除评论标签及其内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在抓取的页面包含这些 HTML 代码.如何使用 bs4 删除注释标签 <!-- -->
及其内容?
猫狗绵羊山羊<!--<p>NewPP 限制报告预处理器节点数:478/300000后扩展包括大小:4852/2097152 字节模板参数大小:870/2097152 字节昂贵的解析器函数计数:2/100ExtLoops 计数:6/100</p>-->
解决方案
<块引用>PageElement.extract() 从树中删除标签或字符串.它返回提取的标签或字符串.
from bs4 import BeautifulSoup, Comment数据 = """猫狗绵羊山羊<!--<p>测试</p>--></div>"""汤 = BeautifulSoup(数据)div = 汤.find('div', class_='foo')对于 div(text=lambda text: isinstance(text, Comment)) 中的元素:element.extract()打印汤.美化()结果你得到你的 div
没有评论:
猫狗绵羊山羊
The page that I'm scraping contains these HTML codes. How do I remove the comment tag <!-- -->
along with its content with bs4?
<div class="foo">
cat dog sheep goat
<!--
<p>NewPP limit report
Preprocessor node count: 478/300000
Post‐expand include size: 4852/2097152 bytes
Template argument size: 870/2097152 bytes
Expensive parser function count: 2/100
ExtLoops count: 6/100
</p>
-->
</div>
解决方案 You can use extract()
(solution is based on this answer):
PageElement.extract() removes a tag or string from the tree. It
returns the tag or string that was extracted.
from bs4 import BeautifulSoup, Comment
data = """<div class="foo">
cat dog sheep goat
<!--
<p>test</p>
-->
</div>"""
soup = BeautifulSoup(data)
div = soup.find('div', class_='foo')
for element in div(text=lambda text: isinstance(text, Comment)):
element.extract()
print soup.prettify()
As a result you get your div
without comments:
<div class="foo">
cat dog sheep goat
</div>
这篇关于美汤4:删除评论标签及其内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文