Python正前pression为美味的汤 [英] Python regular expression for Beautiful Soup
问题描述
我是用美丽的汤拔出具体div标签,似乎我无法使用
简单的字符串匹配。
I am using Beautiful Soup to pull out specific div tags, and it seems I can't use simple string matching.
该页面有一些标签的形式
The page has some tags in the form of
<div class="comment form new"...>
我想忽略,也有些标签的形式
which I want to ignore, and also some tags in the form of
<div class="comment comment-xxxx...">
其中x重新present任意长度的整数,椭圆重新presents用空格分隔的其他值(即我不关心)的任意数量。我想不通的
正确的正则表达式前pression,尤其是因为我从来没有使用Python的重类。
where the x's represent an integer of arbitrary length, and the ellipses represents an arbitrary number of other values separated by white spaces (that I'm not concerned about). I can't figure out the correct regex expression, especially since I've never used python's re class.
使用
soup.find_all(class_="comment")
查找以单词注释的所有标签。我已经尝试使用
finds all tags starting with the word comment. I have tried using
soup.find_all(class_=re.compile(r'(comment)( )(comment)'))
soup.find_all(class_=re.compile(r'comment comment.*'))
和许多其他的变化,但我想我缺少明显的东西在这里如何的正则表达式前pressions或匹配()的工作。谁能帮我?
and lots of other variations, but I think I'm missing something obvious here about how regex expressions or match() work. Can anyone help me out?
推荐答案
我想我知道了:
>>> [div['class'] for div in soup.find_all('div')]
[['comment', 'form', 'new'], ['comment', 'comment-xxxx...']]
注意,不像BS3相当于它不是这样的:
Notice that, unlike the equivalent in BS3, it's not this:
['comment form new', 'comment comment-xxxx...']
这就是为什么你的正则表达式不匹配。
And that's why your regexps won't match.
但你可以匹配,例如,这样的:
But you can match, e.g., this:
>>> soup.find_all('div', class_=re.compile('comment-'))
[<div class="comment comment-xxxx..."></div>]
需要注意的是BS确实 re.search
相当于,没有 re.match
,所以你不需要'评论 - *。
。当然,如果你想匹配评论-12345
而不是评论-OF-另一个实物
你 ð希望,例如,'comment- \\ D +
。
Note that BS does the equivalent of re.search
, not re.match
, so you don't need 'comment-.*'
. Of course if you want to match 'comment-12345'
but not 'comment-of-another-kind
you'd want, e.g., 'comment-\d+'
.
这篇关于Python正前pression为美味的汤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!