使用BeautifulSoup CSS选择器获取文本 [英] Get text with BeautifulSoup CSS Selector
问题描述
示例HTML
<h2 id="name">
ABC
<span class="numbers">123</span>
<span class="lower">abc</span>
</h2>
我可以用以下方式获取数字:
I can get the numbers with something like:
soup.select('#name > span.numbers')[0].text
如何使用BeautifulSoup和select
函数获取文本ABC
?
How do I get the text ABC
using BeautifulSoup and the select
function?
在这种情况下怎么办?
<div id="name">
<div id="numbers">123</div>
ABC
</div>
推荐答案
In the first case, get the previous sibling:
soup.select_one('#name > span.numbers').previous_sibling
在第二种情况下,获取下一个兄弟姐妹:
In the second case, get the next sibling:
soup.select_one('#name > #numbers').next_sibling
请注意,我假设您故意在此处将numbers
作为id
值,并且标记是div
而不是span
.因此,我调整了CSS选择器.
Note that I assume that it is intentional that here you have the numbers
as an id
value and the tag is div
instead of span
. Hence, I've adjusted the CSS selector.
要涵盖这两种情况,您可以转到标记的父级并以非递归模式查找非空文本节点:
To cover both cases, you can go to the parent of the tag and find the non-empty text node in a non-recursive mode:
parent = soup.select_one('#name > .numbers,#numbers').parent
print(parent.find(text=lambda text: text and text.strip(), recursive=False).strip())
请注意选择器中的更改-我们要求匹配numbers
id或numbers
类.
Note the change in the selector - we are asking to match either numbers
id or numbers
class.
尽管如此,我还是觉得这种通用解决方案不太可靠,因为对于初学者来说,我不知道您真正的投入是什么.
Though, I have a feeling that this universal solution would not be quite reliable because, for starters, I don't know what your real inputs could be.
这篇关于使用BeautifulSoup CSS选择器获取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!