复杂的美丽的汤查询 [英] Complex Beautiful Soup query
问题描述
下面是一个HTML文件的一个片段,我用美丽的汤探索。
Here is a snippet of an HTML file I'm exploring with Beautiful Soup.
<td width="50%">
<strong class="sans"><a href="http:/website">Site</a></strong> <br />
我想获得&LT; A HREF&GT;
对于具有任何行&LT;强类=SANS&GT;
键,这是在&LT; TD WIDTH =50%方式&gt;
I would like to get the <a href>
for any line which has the <strong class="sans">
and which is inside a <td width="50%">
.
是否可以查询HTML文件使用美丽的汤那些多个条件?
Is it possible to query a HTML file for those multiple conditions using Beautiful Soup ?
推荐答案
BeautifulSoup的搜索机制接受一个可调用的,它的文档显示,建议为你的情况:如果你需要施加复杂的或者互锁在标签上的属性限制,通在名称的可调用对象,...。 (好吧......他们在谈论具体的属性,但建议反映了一个基本的精神,以BeautifulSoup API)。
BeautifulSoup's search mechanisms accept a callable, which the docs appear to recommend for your case: "If you need to impose complex or interlocking restrictions on a tag's attributes, pass in a callable object for name,...". (ok... they're talking about attributes specifically, but the advice reflects an underlying spirit to the BeautifulSoup API).
如果你想要一个班轮:
soup.findAll(lambda tag: tag.name == 'a' and \
tag.findParent('strong', 'sans') and \
tag.findParent('strong', 'sans').findParent('td', attrs={'width':'50%'}))
我用在这个例子中一个lambda,但在实践中你可能要定义一个可调用的函数,如果你有多个链接的要求,因为这拉姆达必须作出两个 findParent(强, SANS')
呼吁以避免引发异常,如果一个&LT; A&GT;
标签没有强烈
父。使用适当的功能,你可以让测试更有效率。
I've used a lambda in this example, but in practice you may want to define a callable function if you have multiple chained requirements as this lambda has to make two findParent('strong', 'sans')
calls to avoid raising an exception if an <a>
tag has no strong
parent. Using a proper function, you could make the test more efficient.
这篇关于复杂的美丽的汤查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!