为什么要将UTF-8 VARCHAR列转换为XML需要转换为NVARCHAR和编码更改? [英] Why does casting a UTF-8 VARCHAR column to XML require converting to NVARCHAR and encoding change?

查看:101
本文介绍了为什么要将UTF-8 VARCHAR列转换为XML需要转换为NVARCHAR和编码更改?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将varchar列中的数据转换为XML,但是我遇到某些字符的错误。运行这个...

   - 失败
DECLARE @Data VARCHAR(1000)='<?xml version =1.0encoding =utf-8?>< NewDataSet> Test|< / NewDataSet>';
SELECT CAST(@Data AS XML)AS DataXml

...结果如下错误


消息9420,级别16,状态1,行3

XML解析:第1行,字符55,非法的xml字符


看来,这是导致错误的断线字符,但我认为它是UTF的有效字符-8。看看 XML规范,似乎是有效的。



当我将其更改为...

   - 这个工程
DECLARE @Data VARCHAR(1000)='<?xml version =1.0encoding =utf-8?>< NewDataSet> Test|< / NewDataSet>';
SELECT CAST(REPLACE(CAST(@Data AS NVARCHAR(MAX)),'encoding =utf-8','')AS XML)AS DataXml
/ pre>

...它没有错误(将编码字符串替换为utf-16也可以正常工作)。我使用SQL Server 2008 R2与SQL_Latin1_General_CP1_CI_AS Coallation。



任何人都可以告诉我为什么我需要转换为 NVARCHAR 并剥离 encoding =utf-8以使其工作?



谢谢,



修改



看起来这也可以...

  DECLARE @Data VARCHAR(1000)='<? xml version =1.0encoding =utf-8?>< NewDataSet> Test|< / NewDataSet>'; 
SELECT CAST(REPLACE(@Data,'encoding =utf-8','')AS XML)AS DataXml

从prolog中删除utf-8编码就足以使SQL Server进行转换。

解决方案

您的管道字符使用Unicode代码点 U + 00A6 BROKEN BAR 而不是 U + 007C VERTICAL LINE U + 00A6 不在ASCII之外。 VARCHAR 不支持非ASCII字符。这就是为什么你必须使用 NVARCHAR ,而这是为处理Unicode数据而设计的。


I am trying to convert data in a varchar column to XML but I was getting errors with certain characters. Running this ...

-- This fails
DECLARE @Data VARCHAR(1000) = '<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>';
SELECT CAST(@Data AS XML) AS DataXml

... results in the following error

Msg 9420, Level 16, State 1, Line 3
XML parsing: line 1, character 55, illegal xml character

It appears that it's the broken pipe character that is causing the error but I thought that it was a valid character for UTF-8. Looking at the XML spec it appears to be valid.

When I change it to this ...

-- This works
DECLARE @Data VARCHAR(1000) = '<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>';
SELECT CAST(REPLACE(CAST(@Data AS NVARCHAR(MAX)), 'encoding="utf-8"', '') AS XML) AS DataXml

... it works without error (replacing encoding string to utf-16 also works). I'm using SQL Server 2008 R2 with SQL_Latin1_General_CP1_CI_AS Coallation.

Can anyone tell my why I need to convert to NVARCHAR and strip the encoding="utf-8" for this to work?

Thanks,

Edit

It appears that this also works ...

DECLARE @Data VARCHAR(1000) = '<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>';
SELECT CAST(REPLACE(@Data, 'encoding="utf-8"', '') AS XML) AS DataXml

Removing the utf-8 encoding from the prolog is sufficient for SQL Server to do the conversion.

解决方案

Your pipe character is using Unicode codepoint U+00A6 BROKEN BAR instead of U+007C VERTICAL LINE. U+00A6 is outside of ASCII. VARCHAR does not support non-ASCII characters. That is why you have to use NVARCHAR instead, which is designed to handle Unicode data.

这篇关于为什么要将UTF-8 VARCHAR列转换为XML需要转换为NVARCHAR和编码更改?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆