使用BeautifulSoup排除findAll的不需要的结果 [英] Excluding unwanted results of findAll using BeautifulSoup

查看：380 发布时间：2020/9/20 6:50:00 python beautifulsoup screen-scraping

本文介绍了使用BeautifulSoup排除findAll的不需要的结果的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用BeautifulSoup，我的目标是抓取与此HTML挂钩相关的文本:

Using BeautifulSoup, I am aiming to scrape the text associated with this HTML hook:

<p class="review_comment">

因此，使用以下简单代码，

So, using the simple code as follows,

content = page.read()  
soup = BeautifulSoup(content)  
results = soup.find_all("p", "review_comment")

我很高兴地解析这里的文字:

I am happily parsing the text that is living here:

<p class="review_comment">
    This place is terrible!</p>

坏消息是soup.find_all每30次左右匹配一次，它也匹配并捕获我真正不想要的东西，这是用户自更新以来的旧评论:

The bad news is that every 30 or so times the soup.find_all gets a match, it also matches and grabs something that I really don't want, which is a user's old review that they've since updated:

<p class="review_comment">
    It's 1999, and I will always love this place…  
<a href="#" class="show-archived">Read more &raquo;</a></p>

在我试图排除这些旧的重复评论时，我尝试了各种各样的想法.

In my attempts to exclude these old duplicate reviews, I have tried a hodgepodge of ideas.

我一直在尝试更改soup.find_all()调用中的参数专门排除<a href="#" class="show-archived">Read more »</a>

之前的所有文本
我沉迷于正则表达式类型的匹配边缘，但没有成功.

我似乎无法利用class="show-archived"属性.

I've been trying to alter the arguments in my soup.find_all() call to specifically exclude any text that comes before the <a href="#" class="show-archived">Read more »</a>

I've drowned in Regular Expressions-type matching limbo with no success.

I can't seem to take advantage of the class="show-archived" attribute.

任何想法将不胜感激.预先感谢.

Any ideas would be gratefully appreciated. Thanks in advance.

推荐答案

这是您要寻找的吗?

for p in soup.find_all("p", "review_comment"): if p.find(class_='show-archived'): continue # p is now a wanted p

这篇关于使用BeautifulSoup排除findAll的不需要的结果的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用BeautifulSoup排除findAll的不需要的结果 [英] Excluding unwanted results of findAll using BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用BeautifulSoup排除findAll的不需要的结果 [英] Excluding unwanted results of findAll using BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭