如何在Python中使用BeautifulSoup提取标签内的文本? [英] How to extract the text inside a tag with BeautifulSoup in Python?

查看：104 发布时间：2020/9/20 7:09:29 python beautifulsoup

本文介绍了如何在Python中使用BeautifulSoup提取标签内的文本?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个类似这样的html字符串:

Supposing I have an html string like this:

<html>
    <div id="d1">
        Text 1
    </div>
    <div id="d2">
        Text 2
        <a href="http://my.url/">a url</a>
        Text 2 continue
    </div>
    <div id="d3">
        Text 3
    </div>
</html>

我想提取d2的内容，该内容不由其他标签包裹，请跳过a url.换句话说，我想得到这样的结果:

I want to extract the content of d2 that is NOT wrapped by other tags, skipping a url. In other words I want to get such result:

Text 2
Text 2 continue

是否可以使用BeautifulSoup做到这一点?

Is there a way to do it with BeautifulSoup?

我尝试过，但这是不正确的:

I tried this, but it is not correct:

soup = BeautifulSoup(html_doc, 'html.parser')
s = soup.find(id='d2').text
print(s)

推荐答案

尝试使用.find_all(text=True, recursive=False):

from bs4 import BeautifulSoup
div_test="""
<html>
    <div id="d1">
        Text 1
    </div>
    <div id="d2">
        Text 2
        <a href="http://my.url/">a url</a>
        Text 2 continue
    </div>
    <div id="d3">
        Text 3
    </div>
</html>
"""
soup = BeautifulSoup(div_test, 'lxml')
s = soup.find(id='d2').find_all(text=True, recursive=False)
print(s)
print([e.strip() for e in s]) #remove space

它将返回仅包含text的list:

[u'\n        Text 2\n        ', u'\n        Text 2 continue\n    ']
[u'Text 2', u'Text 2 continue']

这篇关于如何在Python中使用BeautifulSoup提取标签内的文本?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Python中使用BeautifulSoup提取标签内的文本? [英] How to extract the text inside a tag with BeautifulSoup in Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在Python中使用BeautifulSoup提取标签内的文本? [英] How to extract the text inside a tag with BeautifulSoup in Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭