SQL Server代码页和排序规则 [英] SQL Server Code Pages and Collations

查看:192
本文介绍了SQL Server代码页和排序规则的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SQL Server中是否有任何方法可以确定代码页中的字符表示什么而无需实际创建该排序规则的测试数据库?



示例。如果我使用排序规则 SQL_Ukrainian_CP1251_CS_AS 创建一个测试数据库,然后执行 CHAR(255),它将返回 я



如果我在具有 SQL_Latin1_General_CP1_CS_AS 排序规则的数据库上尝试以下操作

 选择字符(255)收集SQL_Ukrainian_CP1251_CS_AS 

它将返回 y

  SELECT CHAR(255)

返回ÿ这样显然首先要通过数据库的默认排序规则,然后再尝试查找与显式排序规则中最接近的等效项。可以避免这种情况吗?

解决方案

虽然MS SQL同时支持代码页和Unicode,但它不提供任何函数来进行之间的转换



我看到了两种处理转换的潜在方法,一种是猪,另一种是找出猪在不同代码页中的值所代表的字符。



在此处详细说明
http:// www。 codeguru.com/cpp/data/data-misc/values/article.php/c4571
,涉及将自定义转换程序栓接到数据库上并用于转换。



另一个方法是构造一个db表,该表由

  [CodePage],[ANSI值]组成,[UnicodeValue] 

,其中unicode值存储为int,该int表示要使用转换的unicode字符 nchar()或nchar本身



我们排序规则 SQL_Ukrainian_CP1251_CS_AS ,它是代码页1251(字符串中心的CP1251)。您可以在此处 http://unicode.org/Public/ MAPPINGS / VENDORS / MICSFT / WINDOWS / CP1251.TXT



它是一个TSV,因此在修剪掉顶部数据后,原始数据应该相当干净地导入。



我个人比后者更倾向于后者,特别是对于生产服务器,因为前者可能会带来不稳定。


Is there any way in SQL Server of determining what a character in a code page would represent without actually creating a test database of that collation?

Example. If I create a test database with collation SQL_Ukrainian_CP1251_CS_AS and then do CHAR(255) it returns я.

If I try the following on a database with SQL_Latin1_General_CP1_CS_AS collation however

SELECT CHAR(255) COLLATE SQL_Ukrainian_CP1251_CS_AS

It returns y

SELECT CHAR(255)

Returns ÿ so it is obviously going first via the database's default collation then trying to find the closest equivalent to that in the explicit collation. Can this be avoided?

解决方案

While MS SQL supports both code pages and Unicode unhelpfully it doesn't provide any functions to convert between the two so figuring out what character is represented by a value in a different code page is a pig.

There are two potential methods I've seen to handle conversions, one is detailed here http://www.codeguru.com/cpp/data/data-misc/values/article.php/c4571 and involves bolting a custom conversion program onto the database and using that for conversions.

The other is to construct a db table consisting of

[CodePage], [ANSI Value], [UnicodeValue]

with the unicode value stored as either the int representing the unicode character to be converted using nchar()or the nchar itself

Your using the collation SQL_Ukrainian_CP1251_CS_AS which is code page 1251 (CP1251 from the centre of the string). You can grab its translation table here http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT

Its a TSV so after trimming the top off the raw data should import fairly cleanly.

Personally I'd lean more towards the latter than the former especially for a production server as the former may introduce instability.

这篇关于SQL Server代码页和排序规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆