匹配的ID在BeautifulSoup [英] Matching id's in BeautifulSoup

查看:188
本文介绍了匹配的ID在BeautifulSoup的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用BeautifulSoup - Python模块。我必须要找到像ID的div的任何引用:#后。
例如:

 < D​​IV ID =后45> ...< / DIV>
< D​​IV ID =后334> ...< / DIV>

我如何过滤呢?

  HTML ='< D​​IV ID =后45> ...< / DIV> < D​​IV ID =后334> ...< / DIV>'
soupHandler = BeautifulSoup(HTML)
打印soupHandler.findAll('DIV',ID ='后*')
> []


解决方案

您可以通过一个函数<一个href=\"http://www.crummy.com/software/BeautifulSoup/documentation.html#The%20basic%20find%20method%3a%20findAll%28name,%20attrs,%20recursive,%20text,%20limit,%20%2a%2akwargs%29\">findAll:

 &GT;&GT;&GT;打印soupHandler.findAll('DIV',ID =拉姆达X:X和x.startswith('后'))
[&LT; D​​IV ID =后45&GT; ...&LT; / DIV&gt;中&LT; D​​IV ID =后334&GT; ...&LT; / DIV&GT;]

或常规的前pression:

 &GT;&GT;&GT;打印soupHandler.findAll('DIV',ID = re.compile('^后'))
[&LT; D​​IV ID =后45&GT; ...&LT; / DIV&gt;中&LT; D​​IV ID =后334&GT; ...&LT; / DIV&GT;]

I'm using BeautifulSoup - python module. I have to find any reference to the div's with id like: 'post-#'. For example:

<div id="post-45">...</div>
<div id="post-334">...</div>

How can I filter this?

html = '<div id="post-45">...</div> <div id="post-334">...</div>'
soupHandler = BeautifulSoup(html)
print soupHandler.findAll('div', id='post-*')
> []

解决方案

You can pass a function to findAll:

>>> print soupHandler.findAll('div', id=lambda x: x and x.startswith('post-'))
[<div id="post-45">...</div>, <div id="post-334">...</div>]

Or a regular expression:

>>> print soupHandler.findAll('div', id=re.compile('^post-'))
[<div id="post-45">...</div>, <div id="post-334">...</div>]

这篇关于匹配的ID在BeautifulSoup的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆