XML 验证错误:字符 0x0 超出允许范围. [英] XML validation error: Char 0x0 out of allowed range.

查看:301
本文介绍了XML 验证错误:字符 0x0 超出允许范围.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何处理无效字符才能解析 Python 中的数据?

我目前正在使用 REST API 从以 XML 格式生成数据的源获取数据.但是 XML 数据包含以下字符: ¿¿

在尝试验证数据时,我在此时收到错误消息:

<块引用>

字符 0x0 超出允许范围.

因此我无法解析这些数据.我不确定如何对这些数据进行编码.我该怎么做才能解决这个问题?

解决方案

0x0(又名 NUL)不是 XML 中允许的字符 :

<块引用>

[2] 字符 ::= #x9 |#xA |#xD |[#x20-#xD7FF] |[#xE000-#xFFFD] |[#x10000-#x10FFFF]

因此,您的数据不是 XML,任何符合标准的 XML 处理器都必须报告错误,例如您收到的错误.

在将数据与任何 XML 库一起使用之前,您必须手动或自动将数据视为文本,而不是 XML,,通过删除任何非法字符来修复数据.

对于 Python,请参阅 从 python 中的字符串中删除控制字符 有关如何从字符串中删除 NUL 的提示.这必须在将数据作为 XML 处理之前完成.

How do I handle invalid characters to be able to parse through the data in Python?

I am currently using a REST API to obtain data from a source that produces data in the XML format. However the XML data contains these characters: ¿¿

When trying to validate the data, I get the error at this point which says:

Char 0x0 out of allowed range.

Due to which I am unable to parse this data. I'm not sure how to encode this data. What can I do to solve this problem?

解决方案

0x0 (aka NUL) is not an allowed character in XML :

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

Therefore your data is not XML, and any conformant XML processor must report an error such as the one you received.

You must repair the data by removing any illegal characters by treating it as text, not XML, manually or automatically before using it with any XML libraries.

For Python, see Removing control characters from a string in python for tips on how to remove NUL from a string. This must be done before treating the data as XML.

这篇关于XML 验证错误:字符 0x0 超出允许范围.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆