删除标记使用BeautifulSoup但保留其内容 [英] Remove a tag using BeautifulSoup but keep its contents

查看：1718 发布时间：2016/8/5 18:52:57 python beautifulsoup

本文介绍了删除标记使用BeautifulSoup但保留其内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

目前我有code，做这样的事情：

Currently I have code that does something like this:

soup = BeautifulSoup(value)

for tag in soup.findAll(True):
    if tag.name not in VALID_TAGS:
        tag.extract()
soup.renderContents()

除了我不想扔掉无效标签里面的内容。我该如何摆脱标签，但是调用soup.renderContents时保持里面的内容（）？

Except I don't want to throw away the contents inside the invalid tag. How do I get rid of the tag but keep the contents inside when calling soup.renderContents()?

推荐答案

我用的策略是，以取代其内容的标签，如果他们是类型 NavigableString ，如果他们都没有，然后递归到他们和 NavigableString 替换它们的内容，等等。试试这个：

The strategy I used is to replace a tag with its contents if they are of type NavigableString and if they aren't, then recurse into them and replace their contents with NavigableString, etc. Try this:

from BeautifulSoup import BeautifulSoup, NavigableString

def strip_tags(html, invalid_tags):
    soup = BeautifulSoup(html)

    for tag in soup.findAll(True):
        if tag.name in invalid_tags:
            s = ""

            for c in tag.contents:
                if not isinstance(c, NavigableString):
                    c = strip_tags(unicode(c), invalid_tags)
                s += unicode(c)

            tag.replaceWith(s)

    return soup

html = "<p>Good, <b>bad</b>, and <i>ug<b>l</b><u>y</u></i></p>"
invalid_tags = ['b', 'i', 'u']
print strip_tags(html, invalid_tags)

的结果是：

<p>Good, bad, and ugly</p>

我介绍了另一个问题，这个相同的答案。它似乎来了不少。

I gave this same answer on another question. It seems to come up a lot.

这篇关于删除标记使用BeautifulSoup但保留其内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

删除标记使用BeautifulSoup但保留其内容 [英] Remove a tag using BeautifulSoup but keep its contents

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

删除标记使用BeautifulSoup但保留其内容 [英] Remove a tag using BeautifulSoup but keep its contents

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭