SQL Server - 将varchar转换为另一个排序规则(代码页)以修复字符编码 [英] SQL Server - Convert varchar to another collation (code page) to fix character encoding

查看:263
本文介绍了SQL Server - 将varchar转换为另一个排序规则(代码页)以修复字符编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我查询使用SQL_Latin1_General_CP850_BIN2整理的SQL Server数据库。其中一个表行有一个varchar,其值包括+/-字符(Windows-1252代码页中的十进制代码177)。

I'm querying a SQL Server database that uses the SQL_Latin1_General_CP850_BIN2 collation. One of the table rows has a varchar with a value that includes the +/- character (decimal code 177 in the Windows-1252 codepage).

当我查询表直接在SQL Server Management Studio中,我得到一个乱码字符,而不是这一行中的+/-字符。当我使用此表作为SSIS包中的源时,目标表(使用典型的SQL_Latin1_General_CP1_CI_AS排序规则)以正确的+/-字符结尾。

When I query the table directly in SQL Server Management Studio, I get a gibberish character instead of the +/- character in this row. When I use this table as the source in an SSIS package, the destination table (which uses the typical SQL_Latin1_General_CP1_CI_AS collation), ends up with the correct +/- character.

我现在必须建立一个机制,直接查询源表没有SSIS。我如何做到这一点,我得到正确的字符,而不是胡言乱语?我的猜测是,我需要将列转换/转换为SQL_Latin1_General_CP1_CI_AS排序规则,但是不工作,因为我不断得到一个乱码。字符

I now have to build a mechanism that directly queries the source table without SSIS. How do I do this in a way that I get the correct character instead of gibberish? My guess would be that I would need to convert/cast the column to the SQL_Latin1_General_CP1_CI_AS collation but that isn't working as I keep getting a gibberish character.

尝试以下没有运气:

select 
columnName collate SQL_Latin1_General_CP1_CI_AS
from tableName

select 
cast (columnName as varchar(100)) collate SQL_Latin1_General_CP1_CI_AS
from tableName

select 
convert (varchar, columnName) collate SQL_Latin1_General_CP1_CI_AS
from tableName

我做错了什么?

推荐答案

字符集转换在数据库连接级别隐式完成。您可以使用参数Auto Translate = False强制在ODBC或ADODB连接字符串中关闭自动转换。这不是建议。
请参阅: https://msdn.microsoft.com/en-us /library/ms130822.aspx

Character set conversion is done implicitly on the database connection level. You can force automatic conversion off in the ODBC or ADODB connection string with the parameter "Auto Translate=False". This is NOT recommended. See: https://msdn.microsoft.com/en-us/library/ms130822.aspx

当数据库和客户端代码页不匹配时,SQL Server 2005中的代码页不兼容。
https://support.microsoft.com/kb/KbView/904803

There has been a codepage incompatibility in SQL Server 2005 when Database and Client codepage did not match. https://support.microsoft.com/kb/KbView/904803

SQL-Management Console 2008及更高版本是一个UNICODE应用程序。在应用程序级别上解释所有输入或请求的值。与列整理的对话是隐式完成的。您可以通过以下方式进行验证:

SQL-Management Console 2008 and upwards is a UNICODE application. All values entered or requested are interpreted as such on the application level. Conversation to and from the column collation is done implicitly. You can verify this with:

SELECT CAST(N'±' as varbinary(10)) AS Result

这将返回 0xB100 ,这是Unicode字符U + 00B1在管理控制台窗口中输入)。您无法关闭Management Studio的自动翻译。

This will return 0xB100 which is the Unicode character U+00B1 (as entered in the Management Console window). You cannot turn off "Auto Translate" for Management Studio.

如果您在选择中指定不同的排序规则,只要自动翻译仍处于活动状态,即可进行双重转换(可能会丢失数据)。在选择期间,首先将原始字符转换为新的排序规则,然后将自动翻译转换为正确应用程序代码页。这就是为什么你的各种COLLATION测试仍然显示相同的结果。

If you specify a different collation in the select, you eventually end up in a double conversion (with possible data loss) as long as "Auto Translate" is still active. The original character is first transformed to the new collation during the select, which in turn gets "Auto Translated" to the "proper" application codepage. That's why your various COLLATION tests still show all the same result.

如果将结果转换为 VARBINARY ,您可以验证指定排序规则DOES是否会对select产生影响。而不是 VARCHAR ,因此SQL Server转换在客户端显示之前不会失效:

You can verify that specifying the collation DOES have an effect in the select, if you cast the result as VARBINARY instead of VARCHAR so the SQL Server transformation is not invalidated by the client before it is presented:

SELECT cast(columnName COLLATE SQL_Latin1_General_CP850_BIN2 as varbinary(10)) from tableName
SELECT cast(columnName COLLATE SQL_Latin1_General_CP1_CI_AS as varbinary(10)) from tableName

这将使您 0xF1 0xB1 分别如果 columnName 只包含字符'±'

This will get you 0xF1 or 0xB1 respectively if columnName contains just the character '±'

您仍然可能得到正确的结果,

You still might get the correct result and yet a wrong character, if the font you are using does not provide the proper glyph.

请通过将查询转换为来检查字符的实际内部表示形式。如果您使用的字体未提供正确的字形, VARBINARY ,并验证此代码是否确实对应于定义的数据库归类 SQL_Latin1_General_CP850_BIN2

Please double check the actual internal representation of your character by casting the query to VARBINARY on a proper sample and verify whether this code indeed corresponds to the defined database collation SQL_Latin1_General_CP850_BIN2

SELECT CAST(columnName as varbinary(10)) from tableName

应用程序排序规则和数据库整理中的差异可能会被忽视,只要转换总是以相同的方式进出即可。添加具有不同排序规则的客户端时会出现问题。那么您可能会发现内部转换无法正确匹配字符。

Differences in application collation and database collation might go unnoticed as long as the conversion is always done the same way in and out. Troubles emerge as soon as you add a client with a different collation. Then you might find that the internal conversion is unable to match the characters correctly.

这么说,你应该记住,Management Studio通常不是最终的参考解释结果集。即使它在MS中看起来很蠢,它仍然可能是正确的输出。问题是记录是否在您的应用程序中正确显示。

All that said, you should keep in mind that Management Studio usually is not the final reference when interpreting result sets. Even if it looks gibberish in MS, it still might be the correct output. The question is whether the records show up correctly in your applications.

这篇关于SQL Server - 将varchar转换为另一个排序规则(代码页)以修复字符编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆