Office Open XML中转义字符格式_xHHHH_的有用用例? [英] Useful use cases for escape character format _xHHHH_ in Office Open XML?

查看:257
本文介绍了Office Open XML中转义字符格式_xHHHH_的有用用例?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Office Open XML中的默认编码为UTF-8.因此,Unicode已经成为可能.不过,Microsoft是否定义: ECMA-376第1部分22.4变体类型22.4.2.4 bstr(基本字符串) :

The default encoding in Office Open XML is UTF-8. So Unicode is already possible. Nevertheless does Microsoft defining: ECMA-376 Part 1 22.4 Variant Types 22.4.2.4 bstr (Basic String):

22.4.2.4 bstr(基本字符串)

22.4.2.4 bstr (Basic String)

此元素定义二进制基本字符串变体类型,该类型可以存储任何有效的Unicode字符.不能以XML直接表示为的Unicode字符 XML 1.0规范定义的定义,应使用 Unicode数字字符表示转义字符格式 _xHHHH_,其中H代表字符值中的十六进制字符. [示例:XML中不允许使用Unicode字符8 1.0文档,因此应转为_x0008_. [最终示例]为了存储转义序列的字面形式,初始下划线应为 本身会被转义(即存储为_x005F_). [示例:字符串 文字_x0008_将存储为_x005F_x0008_.最终示例]

This element defines a binary basic string variant type, which can store any valid Unicode character. Unicode characters that cannot be directly represented in XML as defined by the XML 1.0 specification, shall be escaped using the Unicode numerical character representation escape character format _xHHHH_, where H represents a hexadecimal character in the character's value. [Example: The Unicode character 8 is not permitted in an XML 1.0 document, so it shall be escaped as _x0008_. end example] To store the literal form of an escape sequence, the initial underscore shall itself be escaped (i.e. stored as _x005F_). [Example: The string literal _x0008_ would be stored as _x005F_x0008_. end example]

该元素的可能值由W3C XML模式定义 字符串数据类型.

The possible values for this element are defined by the W3C XML Schema string datatype.

这扩展了W3C XML Schema字符串数据类型.因此,字符序列_xHHHH_作为像&#xHHHH;这样的实体确实具有特殊的含义.这意味着每个需要解析Office Open XML的人(*.xlsx*.docx*.pptx)在解析时都必须牢记这一点.例如,如果将"Text _x1234_ text"放入Excel单元格,则Excel确实会将其存储为"Text _x005F_x1234_ text"在XML中.因此,存储在文件中的字符串不同于输入的字符串,也不同于Excel将在单元格中显示的字符串.例如,如果将"Text _x1234_ text"作为字符串单元格内容放入XML,则Excel将在单元格中显示"Text ሴ text".

This extends the W3C XML Schema string datatype. So that the character sequence _xHHHH_ does have a special meaning as a kind of entity like &#xHHHH;. And that means that everyone who needs parsing Office Open XML (*.xlsx, *.docx, *.pptx) must bearing in mind this while parsing. For example if you put "Text _x1234_ text" into an Excel cell, then Excel does storing this as "Text _x005F_x1234_ text" in the XML. So the string stored in the file is different from the string which was entered and also is different from the string which Excel will showing in the cell. For example if you put "Text _x1234_ text" as string cell content into the XML, then Excel will showing "Text ሴ text" into the cell.

请参阅: Apache POI编码中的XSSFCell某些字符序列作为Unicode字符

对我来说很明显,XML 1.0确实具有一些不能直接用XML表示的字符.但这是控制字符,XML的其他用户也可以在没有此类扩展的情况下满足这些限制.如果需要包含控制字符的内容,他们将使用其他正确定义的编码(例如,Base64).

It is clear to me that XML 1.0 does having some characters that cannot be directly represented in XML. But this are control characters and other users of XML are able fulfilling the restrictions without such extensions. They are using other properly defined encodings (Base64 for ex.) if content having control characters in it is needed.

因此,我始终也不会在字符串中为此_xHHHH_寻找一些有用的用例.

So I am always nor looking for some useful use cases for this _xHHHH_ within a string.

问题:

  1. 有人能启发我为什么在Office Open XML中完全需要这种特殊的Unicode数字字符表示转义字符格式_xHHHH_吗?

有人可以在字符串中为此_xHHHH_提供任何有用的用例吗?

Can someone giving any useful use cases for this _xHHHH_ within a string?

推荐答案

作为一个用例,我们的所有数据库都是隔离的,我们需要在不同的数据库上测试一些作业/crons/webservices,现在我们需要导出Excel中的一些数据,并作为另一个DB的输入文件输入到作业中,以检查其是否按预期工作.由于某些权限限制,因此我们的架构是必需的.

As an use case, our all DB is isolated as an requirement and we need to test some jobs/crons/webservices on different DB's, now we need to export some data in an excel and feed to the job as an input file for another DB to check if it's working as expected. Our architecture is required this due to some privileges restriction.

希望这对您有用:)

这篇关于Office Open XML中转义字符格式_xHHHH_的有用用例?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆