与beautifulsoup克隆元素 [英] clone element with beautifulsoup

查看:108
本文介绍了与beautifulsoup克隆元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要复制一个文件到另一个的一部分,但我不希望修改我从复制文件。

I have to copy a part of one document to another, but I don't want to modify the document I copy from.

如果我用 .extract()它会从树的元素。如果我只是喜欢追加 document2.append(document1.tag)选定的元素它仍然从文档1移除元素。

If I use .extract() it removes the element from the tree. If I just append selected element like document2.append(document1.tag) it still removes the element from document1.

当我用真实的文件,我就可以不保存文档1修改后,但有什么办法做到这一点,而不会破坏文档?

As I use real files I can just not save document1 after modification, but is there any way to do this without corrupting a document?

推荐答案

有在BeautifulSoup没有本地克隆功能在4.4版本之前(发布2015年7月);你必须为每个元素保持链接到树的其余部分以创建深拷贝自己,这是棘手的。

There is no native clone function in BeautifulSoup in versions before 4.4 (released July 2015); you'd have to create a deep copy yourself, which is tricky as each element maintains links to the rest of the tree.

要克隆一个元素和它的所有元素,你就必须复制所有属性和重置的他们的父子关系;这有递归发生。这是最好的不是复制的关系进行属性和重新座位每个递归克隆的元素:

To clone an element and all its elements, you'd have to copy all attributes and reset their parent-child relationships; this has to happen recursively. This is best done by not copying the relationship attributes and re-seat each recursively-cloned element:

from bs4 import Tag, NavigableString

def clone(el):
    if isinstance(el, NavigableString):
        return type(el)(el)

    copy = Tag(None, el.builder, el.name, el.namespace, el.nsprefix)
    # work around bug where there is no builder set
    # https://bugs.launchpad.net/beautifulsoup/+bug/1307471
    copy.attrs = dict(el.attrs)
    for attr in ('can_be_empty_element', 'hidden'):
        setattr(copy, attr, getattr(el, attr))
    for child in el.contents:
        copy.append(clone(child))
    return copy

此方法是一种,到现在的BeautifulSoup版本敏感;我测试了这个4.3,未来的版本可能会添加需要被复制过的属性。

This method is kind-of sensitive to the current BeautifulSoup version; I tested this with 4.3, future versions may add attributes that need to be copied too.

您也可以猴补丁此功能为BeautifulSoup:

You could also monkeypatch this functionality into BeautifulSoup:

from bs4 import Tag, NavigableString


def tag_clone(self):
    copy = type(self)(None, self.builder, self.name, self.namespace, 
                      self.nsprefix)
    # work around bug where there is no builder set
    # https://bugs.launchpad.net/beautifulsoup/+bug/1307471
    copy.attrs = dict(self.attrs)
    for attr in ('can_be_empty_element', 'hidden'):
        setattr(copy, attr, getattr(self, attr))
    for child in self.contents:
        copy.append(child.clone())
    return copy


Tag.clone = tag_clone
NavigableString.clone = lambda self: type(self)(self)

让你叫 .clone()直接元素:

document2.body.append(document1.find('div', id_='someid').clone())

申请到BeautifulSoup项目的被接受和调整使用的 copy.copy()功能;现在BeautifulSoup 4.4发布,您可以使用该版本(或更新版本)和做的:

My feature request to the BeautifulSoup project was accepted and tweaked to use the copy.copy() function; now that BeautifulSoup 4.4 is released you can use that version (or newer) and do:

import copy

document2.body.append(copy.copy(document1.find('div', id_='someid')))

这篇关于与beautifulsoup克隆元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆