BeautifulSoup 可以保留 CDATA 部分吗? [英] Can CDATA sections be preserved by BeautifulSoup?

查看：26 发布时间：2021/12/23 20:07:42 python xml beautifulsoup lxml cdata

本文介绍了BeautifulSoup 可以保留 CDATA 部分吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 BeautifulSoup 读取、修改和写入 XML 文件.我在删除 CDATA 部分时遇到了麻烦.这是一个简化的示例.

I'm using BeautifulSoup to read, modify, and write an XML file. I'm having trouble with CDATA sections being stripped out. Here's a simplified example.

罪魁祸首 XML 文件:

The culprit XML file:

<?xml version="1.0" ?>
<foo>
    <bar><![CDATA[
        !@#$%^&*()_+{}|:"<>?,./;'[]-=
    ]]></bar>
</foo>

这是 Python 脚本.

And here's the Python script.

from bs4 import BeautifulSoup

xmlfile = open("cdata.xml", "r") 
soup = BeautifulSoup( xmlfile, "xml" )
print(soup)

这是输出.请注意缺少 CDATA 部分标记.

Here's the output. Note the CDATA section tags are missing.

<?xml version="1.0" encoding="utf-8"?>
<foo>
<bar>
        !@#$%^&amp;*()_+{}|:"&lt;&gt;?,./;'[]-=
    </bar>
</foo>

我也尝试打印 soup.prettify(formatter="xml") 并得到相同的结果，但空白略有不同.文档中没有太多关于在 CDATA 部分中阅读的内容，所以这可能是 lxml 的事情?

I also tried printing soup.prettify(formatter="xml") and got the same result with slightly different whitespace. There isn't much in the docs about reading in CDATA sections, so maybe this is an lxml thing?

有没有办法告诉 BeautifulSoup 保留 CDATA 部分?

Is there a way to tell BeautifulSoup to preserve CDATA sections?

更新是的，这是一个 lxml 的事情.http://lxml.de/api.html#cdata 那么，问题就变成了可以告诉 BeautifulSoup 用 strip_cdata=False 初始化 lxml 吗?

Update Yes, it's an lxml thing. http://lxml.de/api.html#cdata So, the question becomes, is it possible to tell BeautifulSoup to initialize lxml with strip_cdata=False?

BeautifulSoup 可以保留 CDATA 部分吗? [英] Can CDATA sections be preserved by BeautifulSoup?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

BeautifulSoup 可以保留 CDATA 部分吗? [英] Can CDATA sections be preserved by BeautifulSoup?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭