使用BeautifulSoup CSS选择器获取文本 [英] Get text with BeautifulSoup CSS Selector

查看：459 发布时间：2020/8/10 19:46:25 python python-2.7 css-selectors beautifulsoup html-parsing

本文介绍了使用BeautifulSoup CSS选择器获取文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

示例HTML

<h2 id="name">
    ABC
    <span class="numbers">123</span>
    <span class="lower">abc</span>
</h2>

我可以用以下方式获取数字:

I can get the numbers with something like:

soup.select('#name > span.numbers')[0].text

如何使用BeautifulSoup和select函数获取文本ABC?

How do I get the text ABC using BeautifulSoup and the select function?

在这种情况下怎么办?

<div id="name">
    <div id="numbers">123</div> 
    ABC
</div>

推荐答案

在第一种情况下，获取

In the first case, get the previous sibling:

soup.select_one('#name > span.numbers').previous_sibling

在第二种情况下，获取下一个兄弟姐妹:

In the second case, get the next sibling:

soup.select_one('#name > #numbers').next_sibling

请注意，我假设您故意在此处将numbers作为id值，并且标记是div而不是span.因此，我调整了CSS选择器.

Note that I assume that it is intentional that here you have the numbers as an id value and the tag is div instead of span. Hence, I've adjusted the CSS selector.

要涵盖这两种情况，您可以转到标记的父级并以非递归模式查找非空文本节点:

To cover both cases, you can go to the parent of the tag and find the non-empty text node in a non-recursive mode:

parent = soup.select_one('#name > .numbers,#numbers').parent
print(parent.find(text=lambda text: text and text.strip(), recursive=False).strip())

请注意选择器中的更改-我们要求匹配numbers id或numbers类.

Note the change in the selector - we are asking to match either numbers id or numbers class.

尽管如此，我还是觉得这种通用解决方案不太可靠，因为对于初学者来说，我不知道您真正的投入是什么.

Though, I have a feeling that this universal solution would not be quite reliable because, for starters, I don't know what your real inputs could be.

这篇关于使用BeautifulSoup CSS选择器获取文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用BeautifulSoup CSS选择器获取文本 [英] Get text with BeautifulSoup CSS Selector

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用BeautifulSoup CSS选择器获取文本 [英] Get text with BeautifulSoup CSS Selector

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭