为什么破折号(–)会触发非法的XML字符错误(C#/SSMS)? [英] Why does en-dash (–) trigger illegal XML character error (C#/SSMS)?

查看:48
本文介绍了为什么破折号(–)会触发非法的XML字符错误(C#/SSMS)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这不是关于如何解决""XML解析:...非法的xml字符" 错误的问题,但是关于为什么正在发生?我知道有一些修复方法( 1 3 ),但需要在选择最佳解决方案之前知道问题出在哪里(到底是什么导致了错误?).

This is not a question on how to overcome the "XML parsing: ... illegal xml character" error, but about why it is happening? I know that there are fixes(1, 2, 3), but need to know where the problem arises from before choosing the best solution (what causes the error under the hood?).

我们正在使用C#调用基于Java的Web服务.根据返回的强类型数据,我们正在创建一个XML文件,该文件将传递给SQL Server.Web服务数据是使用UTF-8编码的,因此在C#中我们创建文件,并在适当的地方指定UTF-8:

We are calling a Java-based webservice using C#. From the strongly-typed data returned, we are creating an XML file that will be passed to SQL Server. The webservice data is encoding using UTF-8, so in C# we create the file, and specify UTF-8 where appropriate:

var encodingType = Encoding.UTF8;
// logic removed...
var xdoc = new XDocument();
xdoc.Declaration = new XDeclaration("1.0", encodingType.WebName, "yes");
// logic removed...
System.IO.File.WriteAllText(xmlFullPath, xdoc.Declaration.ToString() + xdoc.Document.ToString(), encodingType);

这会在磁盘上创建一个XML文件,其中包含以下(缩写)数据:

This creates an XML file on disk that has contains the following (abbreviated) data:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<records>
  <r RecordName="Option - Foo" />
  <r RecordName="Option – Bar" />
</records>

请注意,在第二条记录中,-不同.我认为第二个实例是破折号.

Notice that in the second record, - is different to . I believe the second instance is en-dash.

如果我在Firefox/IE/VS2015中打开该XML文件.它打开没有错误. W3C XML验证器也可以正常工作.但是,SSMS 2012不喜欢它:

If I open that XML file in Firefox/IE/VS2015. it opens without error. The W3C XML validator also works fine. But, SSMS 2012 does not like it:

declare @xml XML = '<?xml version="1.0" encoding="utf-8" standalone="yes"?><records>
  <r RecordName="Option - Foo" />
  <r RecordName="Option – Bar" />
</records>';

XML解析:第3行,字符25,非法的xml字符

XML parsing: line 3, character 25, illegal xml character

那么为什么破折号会导致错误?根据我的研究,看来

So why does en-dash cause the error? From my research, it would appear that

...只有少数需要转义的实体:<,>,\,'和&在HTML和XML.来源

...其中的一个破折号不是一个.编码版本(用&#8211; 替换)可以正常工作.

...of which en-dash is not one. An encoded version (replacing with &#8211;) works fine.

根据输入,人们说破折号未被识别为UTF-8,但仍在此处列出

Based on the input, people state that en-dash isn't recognised as UTF-8, but yet it is listed here http://www.fileformat.info/info/unicode/char/2013/index.htm So, as a perfectly legal character, why won't SSMS read it when passed as XML (using UTF-8 OR UTF-16)?

推荐答案

可以修改XML编码声明吗?如果是这样

Can you modify the XML encoding declaration? If so;

declare @xml XML = N'<?xml version="1.0" encoding="utf-16" standalone="yes"?><records>
  <r RecordName="Option - Foo" />
  <r RecordName="Option – Bar" />
</records>';

select @xml

(No column name)
<records><r RecordName="Option - Foo" /><r RecordName="Option – Bar" /></records>

投机性编辑

这两个方法均失败,并显示 非法xml字符 :

Speculative Edit

Both of these fail with illegal xml character:

set @xml = '<?xml version="1.0" encoding="utf-8"?><x> – </x>'
set @xml = '<?xml version="1.0" encoding="utf-16"?><x> – </x>'

因为它们将非unicode varchar 传递给XML解析器;字符串包含Unicode,因此必须这样处理,即作为 nvarchar (utf-16)(否则,构成的3个字节会被误解为多个字符,并且一个或超出XML的可接受范围)

because they pass a non-unicode varchar to the XML parser; the string contains Unicode so must be treated as such, i.e. as an nvarchar (utf-16) (otherwise the 3 bytes comprising the are misinterpreted as multiple characters and one or more is not in the acceptable range for XML)

这确实将 nvarchar 字符串传递给解析器,但由于 无法切换编码 而失败:

This does pass a nvarchar string to the parser, but fails with unable to switch the encoding:

set @xml = N'<?xml version="1.0" encoding="utf-8"?><x> – </x>'

这是因为将 nvarchar (utf-16)字符串传递到XML解析器,但是XML文档声明了其utf-8,并且在两种编码

This is because an nvarchar (utf-16) string is passed to the XML parser but the XML document states its utf-8 and the is not equivalent in the two encodings

这一切都是因为utf-16

This works as everything is utf-16

set @xml = N'<?xml version="1.0" encoding="utf-16"?><x> – </x>'

这篇关于为什么破折号(–)会触发非法的XML字符错误(C#/SSMS)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆