如何在获取 Beautiful Soup 元素的 .string 时忽略标签? [英] How do I ignore tags while getting the .string of a Beautiful Soup element?

查看：24 发布时间：2021/12/23 20:09:08 python dom html-parsing beautifulsoup

本文介绍了如何在获取 Beautiful Soup 元素的 .string 时忽略标签?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在处理具有子标签的 HTML 元素，我想忽略"或删除这些子标签，以便文本仍然存在.刚才，如果我尝试 .string 任何带有标签的元素，我得到的只是 None.

导入 bs4汤 = bs4.BeautifulSoup("""<div id="main"><p>这是一个段落.</p><p>这是一个带有标签</span>的段落<span class="test">.</p><p>这是另一段.</p>

""")main = 汤.find(id='main')对于 main.children 中的孩子:打印 child.string

输出:

这是一个段落.没有任何这是另一段.

我希望第二行是 这是一个带有标签的段落..我该怎么做?

解决方案

for child in soup.find(id='main'):if isinstance(child, bs4.Tag):打印 child.text

而且，你会得到:

这是一个段落.这是一个带有标签的段落.这是另一段.

I'm working with HTML elements that have child tags, which I want to "ignore" or remove, so that the text is still there. Just now, if I try to .string any element with tags, all I get is None.

import bs4

soup = bs4.BeautifulSoup("""
    <div id="main">
      <p>This is a paragraph.</p>
      <p>This is a paragraph <span class="test">with a tag</span>.</p>
      <p>This is another paragraph.</p>
    </div>
""")

main = soup.find(id='main')
for child in main.children:
    print child.string

Output:

This is a paragraph.
None
This is another paragraph.

I want the second line to be This is a paragraph with a tag.. How do I do this?

解决方案

for child in soup.find(id='main'):
    if isinstance(child, bs4.Tag):
        print child.text

And, you'll get:

This is a paragraph.
This is a paragraph with a tag.
This is another paragraph.

这篇关于如何在获取 Beautiful Soup 元素的 .string 时忽略标签?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在获取 Beautiful Soup 元素的 .string 时忽略标签? [英] How do I ignore tags while getting the .string of a Beautiful Soup element?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在获取 Beautiful Soup 元素的 .string 时忽略标签? [英] How do I ignore tags while getting the .string of a Beautiful Soup element?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭