TSQL“非法XML字符";将Varbinary转换为XML时 [英] TSQL "Illegal XML Character" When Converting Varbinary to XML
问题描述
我正在尝试在SQL Server 2016中创建一个存储过程,该过程将以前转换为 Varbinary
的XML转换回XML,但是在转换时出现非法XML字符"错误.我找到了一种似乎可行的解决方法,但实际上无法弄清楚为什么为什么起作用,这让我感到不舒服.
I'm trying to create a stored procedure in SQL Server 2016 that converts XML that was previously converted into Varbinary
back into XML, but getting an "Illegal XML character" error when converting. I've found a workaround that seems to work, but I can't actually figure out why it works, which makes me uncomfortable.
存储过程获取在SSIS中转换为二进制并插入表的 varbinary(MAX)
列中的数据,并执行简单的
The stored procedure takes data that was converted to binary in SSIS and inserted into a varbinary(MAX)
column in a table and performs a simple
CAST(Column AS XML)
它工作了很长时间,当最初的XML开始包含®(注册商标)符号时,我才开始看到一个问题.
It worked fine for a long time, and I only began seeing an issue when the initial XML started containing an ® (registered trademark) symbol.
现在,当我尝试将二进制文件转换为XML时,出现此错误
Now, when I attempt to convert the binary to XML I get this error
9420级16州立1行23的消息
XML解析:第1行,字符7,非法xml字符
Msg 9420, Level 16, State 1, Line 23
XML parsing: line 1, character 7, illegal xml character
但是,如果我先将二进制文件转换为 varchar(MAX)
,然后将其转换为XML,它似乎可以正常工作.我不了解执行中间的CAST与直接转换为XML不同时发生了什么.我主要担心的是,我不想将其添加到这种情况下,并且最终会带来意想不到的后果.
However, if I first convert the binary to varchar(MAX)
, then convert that to XML, it seems to work fine. I don't understand what is happening when I perform that intermediate CAST that is different than casting directly to XML. My main concern is that I don't want to add it in to account for this scenario and end up with unintended consequences.
测试代码:
DECLARE @foo VARBINARY(MAX)
DECLARE @bar VARCHAR(MAX)
DECLARE @Nbar NVARCHAR(MAX)
--SELECT Varbinary
SET @foo = CAST( '<Test>®</Test>' AS VARBINARY(MAX))
SELECT @foo AsBinary
--select as binary as varchar
SET @bar = CAST(@foo AS VARCHAR(MAX))
SELECT @bar BinaryAsVarchar -- Correct string output
--select binary as nvarchar
SET @nbar = CAST(@foo AS NVARCHAR(MAX))
SELECT @nbar BinaryAsNvarchar -- Chinese characters
--select binary as XML
SELECT TRY_CAST(@foo AS XML) BinaryAsXML -- ILLEGAL XML character
-- SELECT CONVERT(xml, @obfoo) BinaryAsXML --ILLEGAL XML Character
--select BinaryAsVarcharAsXML
SELECT TRY_CAST(@bar AS XML) BinaryAsVarcharAsXML -- Correct Output
--select BinaryAsNVarcharAsXML
SELECT TRY_CAST(@nbar AS XML) BinaryAsNvarcharAsXML -- Chinese Characters
推荐答案
有几件事要知道:
- SQL-Server的字符编码相当有限.有
VARCHAR
,它是 1字节编码的扩展ASCII 和NVARCHAR
,它是UCS-2
(几乎与utf-16
相同). -
VARCHAR
将 plain latin 用于第一组字符,并将排序规则提供的代码页映射用于第二组. -
VARCHAR
不是utf-8 .utf-8
与VARCHAR
一起使用,只要所有字符都编码为1字节即可.但是utf-8
知道很多2字节编码(最多4字节编码)字符,这会破坏VARCHAR
字符串的内部存储./li> -
NVARCHAR
可以在本机上几乎与任何2字节编码字符一起使用(这意味着几乎可以与任何现有字符一起使用).但这并非完全是utf-16
(存在3字节编码字符,这会破坏SQL Server的内部存储). - XML不会存储为您看到的XML字符串,而是存储为基于
NVARCHAR
值的按层次结构组织的物理表. - 本机存储的XML确实非常快,而任何基于文本的存储都需要事先进行非常昂贵的解析操作(一遍又一遍...).
- 将XML存储为字符串是不好的,将XML存储为
VARCHAR
字符串则更糟. - 将
VARCHAR
-string-XML存储为VARBINARY
是对您不应该做的事情的累加.
- SQL-Server is rather limited with character encodings. There is
VARCHAR
, which is 1-byte-encoded extended ASCII andNVARCHAR
, which isUCS-2
(almost the same asutf-16
). VARCHAR
uses plain latin for the first set of characters and a codepage-mapping provided by the collation in use for the second set.VARCHAR
is not utf-8.utf-8
works withVARCHAR
, as long as all characters are 1-byte-enocded. Bututf-8
knows a lot of 2-byte-enocded (up to 4-byte-enocded) characters, which would break the internal storage of aVARCHAR
string.NVARCHAR
will work with almost any 2-byte encoded character natively (that means with almost any existing character). But it is not exactlyutf-16
(there are 3-byte encoded characters, which would break SQL-Servers internal storage).- XML is not stored as the XML-string you see, but as an hierarchically organised physical table, based on
NVARCHAR
values. - The natively stored XML is really fast, while any text-based storage will need a very expensive parse-operation in advance (over and over...).
- Storing XML as string is bad, storing XML as
VARCHAR
string is even worse. - Storing a
VARCHAR
-string-XML asVARBINARY
is a cummulation of things you should not do.
尝试一下:
DECLARE @text1Byte VARCHAR(100)='<test>blah</test>';
DECLARE @text2Byte NVARCHAR(100)=N'<test>blah</test>';
SELECT CAST(@text1Byte AS VARBINARY(MAX)) AS text1Byte_Binary
,CAST(@text2Byte AS VARBINARY(MAX)) AS text2Byte_Binary
,CAST(@text1Byte AS XML) AS text1Byte_XML
,CAST(@text2Byte AS XML) AS text2Byte_XML
,CAST(CAST(@text1Byte AS VARBINARY(MAX)) AS XML) AS text1Byte_XML_via_Binary
,CAST(CAST(@text2Byte AS VARBINARY(MAX)) AS XML) AS text2Byte_XML_via_Binary
您会看到的唯一区别是 0x3C0074006500730074003E0062006C00610068003C002F0074006500730074003E00
中的许多零.这是由于 nvarchar
的 2字节编码,此示例中不需要每个第二字节.但是,如果您需要远东字符,则情况将完全不同.
The only difference you'll see are the many zeros in 0x3C0074006500730074003E0062006C00610068003C002F0074006500730074003E00
. This is due to the 2-byte-encoding of nvarchar
, each second byte is not needed in this sample. But if you'd need far-east-characters the picture would be completely different.
它起作用的原因:SQL-Server非常聪明.如引擎所知,从变量到XML的转换非常简单,基础变量是 varchar
或 nvarchar
.但是最后两个演员不同.引擎必须检查二进制文件是否为有效的 nvarchar
,如果失败,将再次尝试使用 varchar
.
The reason why it works: SQL-Server is very smart. The cast from the variable to XML is rather easy, as the engine knows, that the underlying variable is varchar
or nvarchar
. But the last two casts are different. The engine has to examine the binary, whether it is a valid nvarchar
and will give it a second try with varchar
if it fails.
现在尝试将您的注册商标添加到给定的示例中.首先将其添加到第二个变量 DECLARE @ text2Byte NVARCHAR(100)= N'< test>blah®</test>';
并尝试运行它.然后将其添加到第一个变量,然后重试.
Now try to add your registered trademark to the given example. Add it first to the second variable DECLARE @text2Byte NVARCHAR(100)=N'<test>blah®</test>';
and try to run this. Then add it to the first variable and try it again.
您可以尝试的方法:
将二进制文件转换为 varchar(max)
,然后转换为 nvarchar(max)
,最后转换为 xml
.
Cast your binary to varchar(max)
, then to nvarchar(max)
and finally to xml
.
,CAST(CAST(CAST(CAST(@text1Byte AS VARBINARY(MAX)) AS VARCHAR(MAX)) AS NVARCHAR(MAX)) AS XML) AS text1Byte_XML_via_Binary
这可以工作,但是不会很快...
This will work, but it won't be fast...
这篇关于TSQL“非法XML字符";将Varbinary转换为XML时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!