使用libxml2解析xml时处理代理对 [英] Handling Surrogate pairs while parsing xml using libxml2

查看：121 发布时间：2020/4/30 10:52:18 xml parsing xml-parsing libxml2

本文介绍了使用libxml2解析xml时处理代理对的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用libxml2解析xml.但是，有时我得到的代理对的代码点超出了 http://www.w3.org/TR/REC-xml/#NT-Char
因此，我的libxml2解析器无法解析它，因此出现错误.有人可以告诉我在使用libxml2解析XML时如何处理代理对.

I am trying to parse xml using libxml2. However, sometimes I get code points of surrogate pairs in it which are outside the range specified in http://www.w3.org/TR/REC-xml/#NT-Char
Because of this, my libxml2 parser is not able to parse it and thus I get error. Can somebody tell me how to handle surrogate pairs while parsing XML using libxml2.

我要解析的示例xml是:

An example xml I want to parse is:

<?xml version="1.0" encoding="UTF-8"?>
<message><body>  &#xD83D;&#xD83D;</body></message>

推荐答案

请注意，xD83D是一个很高的替代品.代理对由高代理和低代理组成.彼此相邻的两个高替代物不是替代物对"，这是胡说八道.

Note that xD83D is a high surrogate. A surrogate pair consists of a high surrogate and a low surrogate; having two high surrogates next to each other is not a "surrogate pair", it is nonsense.

还要注意，用XML表示非BMP字符的正确方法是作为组合字符的单个字符引用，例如𒂫.在某些字符编码中，需要将非BMP字符拆分为两个替代，但在XML字符引用中则不需要(或不允许). XML中的字符引用表示Unicode代码点，而不是特定字符编码专用的数字值.

Also note that the correct way to represent a non-BMP character in XML is as a single character reference for the combined character, for example 𒂫. Splitting a non-BMP character into two surrogates is needed in some character encodings, but it is not needed (or allowed) in XML character references. Character references in XML represent Unicode code-points, not the numeric values specific to a particular character encoding.

如果您无法修复创建此错误XML的程序，则最好的方法是使用脚本(例如在Perl中查找无效的字符引用对，并将其替换为正确的XML表示形式.

If you can't fix the program that created this bad XML, then the best approach would be to repair it using a script e.g. in Perl that looks for the invalid character references pairs and replaces them with the correct XML representation.

这篇关于使用libxml2解析xml时处理代理对的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用libxml2解析xml时处理代理对 [英] Handling Surrogate pairs while parsing xml using libxml2

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用libxml2解析xml时处理代理对 [英] Handling Surrogate pairs while parsing xml using libxml2

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭