如何使用 Beautiful Soup 查找所有评论 [英] How to find all comments with Beautiful Soup

查看:21
本文介绍了如何使用 Beautiful Soup 查找所有评论的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是四年前问的,但是BS4 的答案现在已经过时了.

我想用美丽的汤删除我的 html 文件中的所有评论.由于 BS4 将每个 评论作为一种特殊类型的可导航的字符串,我认为这段代码可以工作:

 用于soup.find_all('comment') 中的注释:评论.分解()

所以这不起作用....我如何使用 BS4 找到所有评论?

解决方案

你可以给 find_all() 传递一个函数来帮助它检查字符串是否为 Comment .

例如我有以下 html:

<!-- 品牌和主导航--><div class="Branding">The Science &amp;您最喜欢的产品背后的安全性</div><div class="l-branding"><p>只是一个品牌</p>

<!-- 在这里测试评论--><div class="block_content"><a href="https://www.google.com">Google</a>

代码:

from bs4 import BeautifulSoup as BS从 bs4 导入评论....汤 = BS(html, 'html.parser')评论 = 汤.find_all(string=lambda text: isinstance(text, Comment))对于 c 评论:打印(c)打印(============)c.extract()

输出将是:

品牌和主导航============测试评论在这里============

顺便说一句,我认为 find_all('Comment') 不起作用的原因是(来自 BeautifulSoup 文档):

<块引用>

传入 name 的值,您将告诉 Beautiful Soup 仅考虑具有特定名称的标签.文本字符串将被忽略,名称不匹配的标签也将被忽略.

This question was asked four years ago, but the answer is now out of date for BS4.

I want to delete all comments in my html file using beautiful soup. Since BS4 makes each comment as a special type of navigable string, I thought this code would work:

for comments in soup.find_all('comment'):
     comments.decompose()

So that didn't work.... How do I find all comments using BS4?

解决方案

You can pass a function to find_all() to help it check whether the string is a Comment.

For example I have below html:

<body>
   <!-- Branding and main navigation -->
   <div class="Branding">The Science &amp; Safety Behind Your Favorite Products</div>
   <div class="l-branding">
      <p>Just a brand</p>
   </div>
   <!-- test comment here -->
   <div class="block_content">
      <a href="https://www.google.com">Google</a>
   </div>
</body>

Code:

from bs4 import BeautifulSoup as BS
from bs4 import Comment
....
soup = BS(html, 'html.parser')
comments = soup.find_all(string=lambda text: isinstance(text, Comment))
for c in comments:
    print(c)
    print("===========")
    c.extract()

the output would be:

Branding and main navigation 
============
test comment here
============

BTW, I think the reason why find_all('Comment') doesn't work is (from BeautifulSoup document):

Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names. Text strings will be ignored, as will tags whose names that don’t match.

这篇关于如何使用 Beautiful Soup 查找所有评论的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
前端开发最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆