复杂的 Beautiful Soup 查询 [英] Complex Beautiful Soup query
问题描述
这是我正在使用 Beautiful Soup 探索的 HTML 文件的片段.
Here is a snippet of an HTML file I'm exploring with Beautiful Soup.
<td width="50%">
<strong class="sans"><a href="http:/website">Site</a></strong> <br />
I would like to get the <a href>
for any line which has the <strong class="sans">
and which is inside a <td width="50%">
.
是否可以使用 Beautiful Soup 查询 HTML 文件中的多个条件?
Is it possible to query a HTML file for those multiple conditions using Beautiful Soup ?
推荐答案
BeautifulSoup 的搜索机制接受可调用的,文档似乎为您的案例推荐:如果您需要对标签的属性施加复杂或互锁的限制,请通过在名称的可调用对象中,...".(好吧……他们专门讨论属性,但建议反映了 BeautifulSoup API 的潜在精神.
BeautifulSoup's search mechanisms accept a callable, which the docs appear to recommend for your case: "If you need to impose complex or interlocking restrictions on a tag's attributes, pass in a callable object for name,...". (ok... they're talking about attributes specifically, but the advice reflects an underlying spirit to the BeautifulSoup API).
如果你想要一个单线:
soup.findAll(lambda tag: tag.name == 'a' and
tag.findParent('strong', 'sans') and
tag.findParent('strong', 'sans').findParent('td', attrs={'width':'50%'}))
我在这个例子中使用了一个 lambda,但在实践中,如果你有多个链式需求,你可能想要定义一个可调用函数,因为这个 lambda 必须创建两个 findParent('strong', 'sans')
调用以避免在 标签没有
strong
父标签时引发异常.使用适当的函数,可以使测试更有效率.
I've used a lambda in this example, but in practice you may want to define a callable function if you have multiple chained requirements as this lambda has to make two findParent('strong', 'sans')
calls to avoid raising an exception if an <a>
tag has no strong
parent. Using a proper function, you could make the test more efficient.
这篇关于复杂的 Beautiful Soup 查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!