如何从lxml中的节点删除标签而没有尾巴? [英] How delete tag from node in lxml without tail?

查看：162 发布时间：2020/9/20 7:43:29 python beautifulsoup html-parsing lxml

本文介绍了如何从lxml中的节点删除标签而没有尾巴?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

示例:

html = <a><b>Text</b>Text2</a>

BeautifullSoup代码

BeautifullSoup code

[x.extract() for x in html.findAll(.//b)]

在出口处，我们有

html = <a>Text2</a>

Lxml代码:

[bad.getparent().remove(bad) for bad in html.xpath(".//b")]

在出口处，我们有

html = <a></a>

因为lxml认为"Text2"是<b></b>

because lxml think "Text2" it's a tail of <b></b>

如果我们只需要标签连接中的文本行，则可以使用:

If we need only text line from join of tags we can use:

for bad in raw.xpath(xpath_search):
    bad.text = ''

但是，如何在不更改文本的情况下而不删除尾部的标签呢?

But, how do that without changing text, but remove tags without tail?

推荐答案

我执行了以下操作，以将尾部文本保护到前一个兄弟姐妹或父级父母身上.

I did the following to safe the tail text to the previous sibling or parent.

def remove_keeping_tail(self, element):
    """Safe the tail text and then delete the element"""
    self._preserve_tail_before_delete(element)
    element.getparent().remove(element)

def _preserve_tail_before_delete(self, node):
    if node.tail: # preserve the tail
        previous = node.getprevious()
        if previous is not None: # if there is a previous sibling it will get the tail
            if previous.tail is None:
                previous.tail = node.tail
            else:
                previous.tail = previous.tail + node.tail
        else: # The parent get the tail as text
            parent = node.getparent()
            if parent.text is None:
                parent.text = node.tail
            else:
                parent.text = parent.text + node.tail

HTH

这篇关于如何从lxml中的节点删除标签而没有尾巴?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从lxml中的节点删除标签而没有尾巴? [英] How delete tag from node in lxml without tail?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何从lxml中的节点删除标签而没有尾巴? [英] How delete tag from node in lxml without tail?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭