如何找到所有的评论与美丽的汤 [英] How to find all comments with Beautiful Soup

查看:128
本文介绍了如何找到所有的评论与美丽的汤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是在四年前提出的,但现在的答案是过时的BS4。

This question was asked four years ago, but the answer is now out of date for BS4.

我想使用美丽的汤删除我的html文件中的所有评论。由于BS4会将每个评论视为特殊类型的导航字符串,我认为这段代码可以工作:

I want to delete all comments in my html file using beautiful soup. Since BS4 makes each comment as a special type of navigable string, I thought this code would work:

for comments in soup.find_all('comment'):
     comments.decompose()

我找到使用BS4的所有注释?

So that didn't work.... How do I find all comments using BS4?

推荐答案

你可以传递一个函数find_all评论。

You can pass a function to find_all() to help it check whether the string is a Comment.

例如我有下面的html:

For example I have below html:

<body>
   <!-- Branding and main navigation -->
   <div class="Branding">The Science &amp; Safety Behind Your Favorite Products</div>
   <div class="l-branding">
      <p>Just a brand</p>
   </div>
      <!-- test comment here -->
      <div class="block_content">
          <a href="https://www.google.com">Google</a>
   </div>
</body>

代码:

from bs4 import BeautifulSoup as BS
from bs4 import Comment
....
soup=BS(html,'html.parser')
comments=soup.find_all(string=lambda text:isinstance(text,Comment))
for c in comments:
    print c
    print "==========="
    c.decompose()

输出将是:

Branding and main navigation 
============
test comment here
============

BTW,我想原因 find_all('Comment')不工作是(从BeautifulSoup文档):

BTW, I think the reason why find_all('Comment') doesn't work is (from BeautifulSoup document):


名字,你会告诉美丽的汤只考虑具有某些名称的标签。 文本字符串将被忽略,其名称不匹配的标记。

Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names. Text strings will be ignored, as will tags whose names that don’t match.

这篇关于如何找到所有的评论与美丽的汤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆