Beautiful Soup 的 Python 正则表达式 [英] Python regular expression for Beautiful Soup

查看：11 发布时间：2021/12/23 20:02:02 python regex beautifulsoup

本文介绍了Beautiful Soup 的 Python 正则表达式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我用Beautiful Soup拉出特定的div标签，好像不能用简单的字符串匹配.

I am using Beautiful Soup to pull out specific div tags, and it seems I can't use simple string matching.

页面有一些

<div class="comment form new"...>

我想忽略的，还有一些

where the x's represent an integer of arbitrary length, and the ellipses represents an arbitrary number of other values separated by white spaces (that I'm not concerned about). I can't figure out the 
correct regex expression, especially since I've never used python's re class.

其中 x 代表任意长度的整数，椭圆代表任意数量的由空格分隔的其他值(我不关心).我想不通正确的正则表达式，尤其是因为我从未使用过 python 的 re 类.

soup.find_all(class_="comment")

使用

soup.find_all(class_=re.compile(r'(comment)( )(comment)'))
soup.find_all(class_=re.compile(r'comment comment.*'))

查找以单词 comment 开头的所有标签.我试过使用

and lots of other variations, but I think I'm missing something obvious here about how regex expressions or match() work. Can anyone help me out?

 解决方案

和许多其他变体，但我认为我在这里遗漏了一些关于正则表达式或 match() 工作方式的明显内容.谁能帮帮我?

I think I've got it:

推荐答案

我想我明白了:

请注意，与 BS3 中的等效项不同，它不是这样的:

['comment form new', 'comment comment-xxxx...']

And that's why your regexps won't match.

这就是您的正则表达式不匹配的原因.

But you can match, e.g., this:

但是你可以匹配，例如，这个:

>>> soup.find_all('div', class_=re.compile('comment-')) [<div class="comment comment-xxxx..."></div>]

请注意，BS 相当于 re.search，而不是 re.match，因此您不需要 'comment-.*'.当然，如果你想匹配 'comment-12345' 而不是 'comment-of-another-kind 你想要，例如， 'comment-d+'.


Note that BS does the equivalent of re.search, not re.match, so you don't need 'comment-.*'. Of course if you want to match 'comment-12345' but not 'comment-of-another-kind you'd want, e.g., 'comment-d+'.

                        这篇关于Beautiful Soup 的 Python 正则表达式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

Beautiful Soup 的 Python 正则表达式 [英] Python regular expression for Beautiful Soup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Beautiful Soup 的 Python 正则表达式 [英] Python regular expression for Beautiful Soup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭