应该如何在 XML 属性值中处理 '\t' 字符? [英] How should the '\t' character be handled within XML attribute values?

查看:43
本文介绍了应该如何在 XML 属性值中处理 '\t' 字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我似乎发现 .Net 3.5 中的各种 XML 实现之间存在一些不一致,我正在努力找出名义上是正确的.

I seem to have found something of an inconsistency between the various XML implementations within .Net 3.5 and I'm struggling to work out which is nominally correct.

这个问题实际上很容易重现:

The issue is actually fairly easy to reproduce:

  1. 创建一个简单的 xml 文档,其文本元素包含 '\t' 字符,并为其指定一个包含 '\t' 字符的属性:

  1. Create a simple xml document with a text element containing '\t' characters and give it an attribute that contains '\t' characters:

var xmlDoc = new XmlDocument { PreserveWhitespace = false, };
xmlDoc.LoadXml("<test><text attrib=\"Tab'\t'space' '\">Tab'\t'space' '</text></test>");
xmlDoc.Save(@"d:\TabTest.xml");

注意:这意味着 XmlDocument 本身对属性值中的 '\t' 字符非常满意.

NB: This means that XmlDocument itself is quite happy with '\t' characters in an attribuite value.

使用新的 XmlTextReader 加载文档:

Load the document using new XmlTextReader:

var rawFile = XmlReader.Create(@"D:\TabTest.xml");
var rawDoc = new XmlDocument();
rawDoc.Load(rawFile);

  • 使用 XmlReader.Create 加载文档:

  • Load the document using XmlReader.Create:

    var rawFile2 = new XmlTextReader(@"D:\TabTest.xml");
    var rawDoc2 = new XmlDocument();
    rawDoc2.Load(rawFile2);
    

  • 比较调试器中的文档:

  • Compare the documents in the debugger:

    (rawDoc).InnerXml   "<test><text attrib=\"Tab' 'space' '\">Tab'\t'space' '</text></test>"   string
    (rawDoc2).InnerXml  "<test><text attrib=\"Tab'\t'space' '\">Tab'\t'space' '</text></test>"  string
    

  • 使用新的 XmlTextReader 读取的文档符合我的预期,文本值和属性值中的 '\t' 都符合预期.但是,如果您查看 XmlReader.Create 读取的文档,您会发现属性值中的 '\t' 字符将被转换为 ' ' 字符.

    The document read using new XmlTextReader was what I expected, both the '\t' in the text value and attribute value was there as expected. However, if you look at the document read by XmlReader.Create you find that the '\t' character in the attribute value will have been converted into a ' ' character.

    什么……!!:-)

    经过一番谷歌搜索后,我发现我可以将 '\t' 编码为 '&#x9;'- 如果我在示例 XML 中使用它而不是 '\t' 两个读者都按预期工作.

    After a bit of a Google search I found that I could encode a '\t' as '&#x9;' - if I used this instead of '\t' in the example XML both readers work as expected.

    现在 Altova XmlSpy 和其他各种 XML 阅读器似乎对属性值中的 '\t' 字符非常满意,我的问题是处理这个问题的正确方法是什么?

    Now Altova XmlSpy and various other XML readers seem to be perfectly happy with '\t' characters in attribute values, my question is what is the correct way to handle this?

    我应该编写包含在属性值中编码的 '\t' 字符的 XML 文件,例如 XmlReader.Create 需要,还是其他 XML 工具正确且 '\t' 字符有效而 XmlReader.Create 已损坏?

    Should I be writing XML file with '\t' characters encoded in attribute values like XmlReader.Create expects or are the other XML tools right and '\t' characters are valid and XmlReader.Create is broken?

    我应该通过哪种方式来解决/解决此问题?

    Which way should I go to fix/work around this issue?

    推荐答案

    @all:感谢您的所有回答和评论.

    @all: Thanks for all your answers and comments.

    Justin 和 Michael Kay 似乎是正确的,应该根据 W3C XML 规范对空白进行编码,问题是大量 MS 实现不遵守此要求.

    It would seem that Justin and Michael Kay are correct and white space should be encoded according to the W3C XML specifications and that the issue is that a significant number of the MS implementations do not honour this requirement.

    就我而言,除了 XML 规范之外,我真正想要的是正确保留属性值 - 即保存的值应该与读取的值完全一致.

    In my case, XML specification aside, all I really want is for the attribute values to be correctly persisted - i.e. the values saved should be exactly the values read.

    对此的答案是在首先保存 XML 文件时强制使用通过 XmlWriter.Create 方法创建的 XmlWriter.

    The answer to that is to force the use of an XmlWriter created by using XmlWriter.Create method when saving the XML files in the first place.

    虽然 Dataset 和 XmlDocument 都提供了保存/写入机制,但在以默认形式使用时,它们都不能正确编码属性中的空格.但是,如果我强制他们使用手动创建的 XmlWriter,则会应用正确的编码并将其写入文件.

    While both Dataset and XmlDocument provide save/write mechanisms neither of them correctly encode white space in attributes when used in their default form. If I force them to use a manually created XmlWriter, however, the correct encoding is applied and written to the file.

    于是原文件保存代码变为:

    So the original file save code becomes:

    var xmlDoc = new XmlDocument { PreserveWhitespace = false, };
    xmlDoc.LoadXml("<test><text attrib=\"Tab'\t'space' '\">Tab'\t'space' '</text></test>");
    
    using (var xmlWriter = XmlWriter.Create(@"d:\TabTest.Encoded.xml"))
    {
        xmlDoc.Save(xmlWriter);
    }
    

    然后此编写器以对称方式正确编码空白区域,以便 XmlReader.Create 读取器在不更改属性值的情况下进行读取.

    This writer then correctly encodes the white space in a symmetrical way for the XmlReader.Create reader to read without altering the attribute values.

    这里要注意的另一件事是,该解决方案完全封装了我的代码中的编码,因为读写器在读取和写入时透明地执行编码和解码.

    The other thing to note here is that this solution encapsulates the encoding from my code entirely as the reader and writer perform the encoding and decoding transparently on read and write.

    这篇关于应该如何在 XML 属性值中处理 '\t' 字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆