使用BeautifulSoup CSS选择器获取文本 [英] Get text with BeautifulSoup CSS Selector

查看:459
本文介绍了使用BeautifulSoup CSS选择器获取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

示例HTML

<h2 id="name">
    ABC
    <span class="numbers">123</span>
    <span class="lower">abc</span>
</h2>

我可以用以下方式获取数字:

I can get the numbers with something like:

soup.select('#name > span.numbers')[0].text

如何使用BeautifulSoup和select函数获取文本ABC?

How do I get the text ABC using BeautifulSoup and the select function?

在这种情况下怎么办?

<div id="name">
    <div id="numbers">123</div> 
    ABC
</div>

推荐答案

在第一种情况下,获取

In the first case, get the previous sibling:

soup.select_one('#name > span.numbers').previous_sibling

在第二种情况下,获取下一个兄弟姐妹:

In the second case, get the next sibling:

soup.select_one('#name > #numbers').next_sibling

请注意,我假设您故意在此处将numbers作为id值,并且标记是div而不是span.因此,我调整了CSS选择器.

Note that I assume that it is intentional that here you have the numbers as an id value and the tag is div instead of span. Hence, I've adjusted the CSS selector.

要涵盖这两种情况,您可以转到标记的父级并以非递归模式查找非空文本节点:

To cover both cases, you can go to the parent of the tag and find the non-empty text node in a non-recursive mode:

parent = soup.select_one('#name > .numbers,#numbers').parent
print(parent.find(text=lambda text: text and text.strip(), recursive=False).strip())

请注意选择器中的更改-我们要求匹配numbers id或numbers类.

Note the change in the selector - we are asking to match either numbers id or numbers class.

尽管如此,我还是觉得这种通用解决方案不太可靠,因为对于初学者来说,我不知道您真正的投入是什么.

Though, I have a feeling that this universal solution would not be quite reliable because, for starters, I don't know what your real inputs could be.

这篇关于使用BeautifulSoup CSS选择器获取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆