TSQL“非法XML字符";将Varbinary转换为XML时 [英] TSQL "Illegal XML Character" When Converting Varbinary to XML

查看:89
本文介绍了TSQL“非法XML字符";将Varbinary转换为XML时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在SQL Server 2016中创建一个存储过程,该过程将以前转换为 Varbinary 的XML转换回XML,但是在转换时出现非法XML字符"错误.我找到了一种似乎可行的解决方法,但实际上无法弄清楚为什么为什么起作用,这让我感到不舒服.

I'm trying to create a stored procedure in SQL Server 2016 that converts XML that was previously converted into Varbinary back into XML, but getting an "Illegal XML character" error when converting. I've found a workaround that seems to work, but I can't actually figure out why it works, which makes me uncomfortable.

存储过程获取在SSIS中转换为二进制并插入表的 varbinary(MAX)列中的数据,并执行简单的

The stored procedure takes data that was converted to binary in SSIS and inserted into a varbinary(MAX) column in a table and performs a simple

CAST(Column AS XML)

它工作了很长时间,当最初的XML开始包含®(注册商标)符号时,我才开始看到一个问题.

It worked fine for a long time, and I only began seeing an issue when the initial XML started containing an ® (registered trademark) symbol.

现在,当我尝试将二进制文件转换为XML时,出现此错误

Now, when I attempt to convert the binary to XML I get this error

9420级16州立1行23的消息
XML解析:第1行,字符7,非法xml字符

Msg 9420, Level 16, State 1, Line 23
XML parsing: line 1, character 7, illegal xml character

但是,如果我先将二进制文件转换为 varchar(MAX),然后将其转换为XML,它似乎可以正常工作.我不了解执行中间的CAST与直接转换为XML不同时发生了什么.我主要担心的是,我不想将其添加到这种情况下,并且最终会带来意想不到的后果.

However, if I first convert the binary to varchar(MAX), then convert that to XML, it seems to work fine. I don't understand what is happening when I perform that intermediate CAST that is different than casting directly to XML. My main concern is that I don't want to add it in to account for this scenario and end up with unintended consequences.

测试代码:

DECLARE @foo VARBINARY(MAX)
DECLARE @bar VARCHAR(MAX)
DECLARE @Nbar NVARCHAR(MAX) 

--SELECT Varbinary
SET @foo = CAST( '<Test>®</Test>' AS VARBINARY(MAX)) 
SELECT @foo AsBinary


--select as binary as varchar
SET @bar = CAST(@foo AS VARCHAR(MAX))

SELECT @bar BinaryAsVarchar                             -- Correct string output

--select binary as nvarchar
SET @nbar = CAST(@foo AS NVARCHAR(MAX))
SELECT @nbar BinaryAsNvarchar                           -- Chinese characters 

--select binary as XML
SELECT TRY_CAST(@foo AS XML) BinaryAsXML                -- ILLEGAL XML character
-- SELECT CONVERT(xml, @obfoo) BinaryAsXML                    --ILLEGAL XML Character

--select BinaryAsVarcharAsXML
SELECT TRY_CAST(@bar AS XML) BinaryAsVarcharAsXML       -- Correct Output

--select BinaryAsNVarcharAsXML
SELECT TRY_CAST(@nbar AS XML) BinaryAsNvarcharAsXML     -- Chinese Characters

推荐答案

有几件事要知道:

  • SQL-Server的字符编码相当有限.有 VARCHAR ,它是 1字节编码的扩展ASCII NVARCHAR ,它是 UCS-2 (几乎与 utf-16 相同).
  • VARCHAR plain latin 用于第一组字符,并将排序规则提供的代码页映射用于第二组.
  • VARCHAR 不是utf-8 . utf-8 VARCHAR 一起使用,只要所有字符都编码为1字节即可.但是 utf-8 知道很多2字节编码(最多4字节编码)字符,这会破坏 VARCHAR 字符串的内部存储./li>
  • NVARCHAR 可以在本机上几乎与任何2字节编码字符一起使用(这意味着几乎可以与任何现有字符一起使用).但这并非完全是 utf-16 (存在3字节编码字符,这会破坏SQL Server的内部存储).
  • XML不会存储为您看到的XML字符串,而是存储为基于 NVARCHAR 值的按层次结构组织的物理表.
  • 本机存储的XML确实非常快,而任何基于文本的存储都需要事先进行非常昂贵的解析操作(一遍又一遍...).
  • 将XML存储为字符串是不好的,将XML存储为 VARCHAR 字符串则更糟.
  • VARCHAR -string-XML存储为 VARBINARY 是对您不应该做的事情的累加.
  • SQL-Server is rather limited with character encodings. There is VARCHAR, which is 1-byte-encoded extended ASCII and NVARCHAR, which is UCS-2 (almost the same as utf-16).
  • VARCHAR uses plain latin for the first set of characters and a codepage-mapping provided by the collation in use for the second set.
  • VARCHAR is not utf-8. utf-8 works with VARCHAR, as long as all characters are 1-byte-enocded. But utf-8 knows a lot of 2-byte-enocded (up to 4-byte-enocded) characters, which would break the internal storage of a VARCHAR string.
  • NVARCHAR will work with almost any 2-byte encoded character natively (that means with almost any existing character). But it is not exactly utf-16 (there are 3-byte encoded characters, which would break SQL-Servers internal storage).
  • XML is not stored as the XML-string you see, but as an hierarchically organised physical table, based on NVARCHAR values.
  • The natively stored XML is really fast, while any text-based storage will need a very expensive parse-operation in advance (over and over...).
  • Storing XML as string is bad, storing XML as VARCHAR string is even worse.
  • Storing a VARCHAR-string-XML as VARBINARY is a cummulation of things you should not do.

尝试一下:

DECLARE @text1Byte VARCHAR(100)='<test>blah</test>';
DECLARE @text2Byte NVARCHAR(100)=N'<test>blah</test>';

SELECT CAST(@text1Byte AS VARBINARY(MAX)) AS text1Byte_Binary
      ,CAST(@text2Byte AS VARBINARY(MAX)) AS text2Byte_Binary
      ,CAST(@text1Byte AS XML) AS text1Byte_XML
      ,CAST(@text2Byte AS XML) AS text2Byte_XML
      ,CAST(CAST(@text1Byte AS VARBINARY(MAX)) AS XML) AS text1Byte_XML_via_Binary
      ,CAST(CAST(@text2Byte AS VARBINARY(MAX)) AS XML) AS text2Byte_XML_via_Binary

您会看到的唯一区别是 0x3C0074006500730074003E0062006C00610068003C002F0074006500730074003E00 中的许多零.这是由于 nvarchar 2字节编码,此示例中不需要每个第二字节.但是,如果您需要远东字符,则情况将完全不同.

The only difference you'll see are the many zeros in 0x3C0074006500730074003E0062006C00610068003C002F0074006500730074003E00. This is due to the 2-byte-encoding of nvarchar, each second byte is not needed in this sample. But if you'd need far-east-characters the picture would be completely different.

它起作用的原因:SQL-Server非常聪明.如引擎所知,从变量到XML的转换非常简单,基础变量是 varchar nvarchar .但是最后两个演员不同.引擎必须检查二进制文件是否为有效的 nvarchar ,如果失败,将再次尝试使用 varchar .

The reason why it works: SQL-Server is very smart. The cast from the variable to XML is rather easy, as the engine knows, that the underlying variable is varchar or nvarchar. But the last two casts are different. The engine has to examine the binary, whether it is a valid nvarchar and will give it a second try with varchar if it fails.

现在尝试将您的注册商标添加到给定的示例中.首先将其添加到第二个变量 DECLARE @ text2Byte NVARCHAR(100)= N'< test>blah®</test>'; 并尝试运行它.然后将其添加到第一个变量,然后重试.

Now try to add your registered trademark to the given example. Add it first to the second variable DECLARE @text2Byte NVARCHAR(100)=N'<test>blah®</test>'; and try to run this. Then add it to the first variable and try it again.

您可以尝试的方法:

将二进制文件转换为 varchar(max),然后转换为 nvarchar(max),最后转换为 xml .

Cast your binary to varchar(max), then to nvarchar(max) and finally to xml.

,CAST(CAST(CAST(CAST(@text1Byte AS VARBINARY(MAX)) AS VARCHAR(MAX)) AS NVARCHAR(MAX)) AS XML) AS text1Byte_XML_via_Binary

这可以工作,但是不会很快...

This will work, but it won't be fast...

这篇关于TSQL“非法XML字符";将Varbinary转换为XML时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆