为什么要将UTF-8 VARCHAR列转换为XML需要转换为NVARCHAR和编码更改? [英] Why does casting a UTF-8 VARCHAR column to XML require converting to NVARCHAR and encoding change?
问题描述
我正在尝试将varchar列中的数据转换为XML,但是我遇到某些字符的错误。运行这个...
- 失败
DECLARE @Data VARCHAR(1000)='<?xml version =1.0encoding =utf-8?>< NewDataSet> Test|< / NewDataSet>';
SELECT CAST(@Data AS XML)AS DataXml
...结果如下错误
消息9420,级别16,状态1,行3
XML解析:第1行,字符55,非法的xml字符
看来,这是导致错误的断线字符,但我认为它是UTF的有效字符-8。看看 XML规范,似乎是有效的。
当我将其更改为...
- 这个工程
/ pre>
DECLARE @Data VARCHAR(1000)='<?xml version =1.0encoding =utf-8?>< NewDataSet> Test|< / NewDataSet>';
SELECT CAST(REPLACE(CAST(@Data AS NVARCHAR(MAX)),'encoding =utf-8','')AS XML)AS DataXml
...它没有错误(将编码字符串替换为utf-16也可以正常工作)。我使用SQL Server 2008 R2与SQL_Latin1_General_CP1_CI_AS Coallation。
任何人都可以告诉我为什么我需要转换为
NVARCHAR
并剥离encoding =utf-8
以使其工作?
谢谢,
修改
看起来这也可以...
DECLARE @Data VARCHAR(1000)='<? xml version =1.0encoding =utf-8?>< NewDataSet> Test|< / NewDataSet>';
SELECT CAST(REPLACE(@Data,'encoding =utf-8','')AS XML)AS DataXml
从prolog中删除utf-8编码就足以使SQL Server进行转换。
解决方案您的管道字符使用Unicode代码点
U + 00A6 BROKEN BAR
而不是U + 007C VERTICAL LINE
。U + 00A6
不在ASCII之外。VARCHAR
不支持非ASCII字符。这就是为什么你必须使用NVARCHAR
,而这是为处理Unicode数据而设计的。I am trying to convert data in a varchar column to XML but I was getting errors with certain characters. Running this ...
-- This fails DECLARE @Data VARCHAR(1000) = '<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>'; SELECT CAST(@Data AS XML) AS DataXml
... results in the following error
Msg 9420, Level 16, State 1, Line 3
XML parsing: line 1, character 55, illegal xml characterIt appears that it's the broken pipe character that is causing the error but I thought that it was a valid character for UTF-8. Looking at the XML spec it appears to be valid.
When I change it to this ...
-- This works DECLARE @Data VARCHAR(1000) = '<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>'; SELECT CAST(REPLACE(CAST(@Data AS NVARCHAR(MAX)), 'encoding="utf-8"', '') AS XML) AS DataXml
... it works without error (replacing encoding string to utf-16 also works). I'm using SQL Server 2008 R2 with SQL_Latin1_General_CP1_CI_AS Coallation.
Can anyone tell my why I need to convert to
NVARCHAR
and strip theencoding="utf-8"
for this to work?Thanks,
Edit
It appears that this also works ...
DECLARE @Data VARCHAR(1000) = '<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>'; SELECT CAST(REPLACE(@Data, 'encoding="utf-8"', '') AS XML) AS DataXml
Removing the utf-8 encoding from the prolog is sufficient for SQL Server to do the conversion.
解决方案Your pipe character is using Unicode codepoint
U+00A6 BROKEN BAR
instead ofU+007C VERTICAL LINE
.U+00A6
is outside of ASCII.VARCHAR
does not support non-ASCII characters. That is why you have to useNVARCHAR
instead, which is designed to handle Unicode data.这篇关于为什么要将UTF-8 VARCHAR列转换为XML需要转换为NVARCHAR和编码更改?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!