所有归类中包含相同长度整数的字符串的数字排序顺序 [英] Numeric sort order for strings containing same-length integers under all collations

查看:78
本文介绍了所有归类中包含相同长度整数的字符串的数字排序顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以安全地假定SQL Server中的所有排序规则都将对包含相同长度整数的字符串给出预期(即数字)排序顺序?例如,假设 @text 仅包含非负整数( [0-9] + ),则以下代码段工作,以确保该值不会超出 int 范围,否则可能会有一些排序规则,其中 @text< ='2147483647'会产生意外结果吗?

Is it safe to assume that all collations in SQL Server will give the "expected" (i.e. numeric) sort order on strings containing integers of the same length? For example, assuming that @text contains just non-negative integers ([0-9]+), would the following snippet work for ensuring that the value does not overflow the int range, or might there be some collation where @text <= '2147483647' gives unexpected results?

IF LEN(@text) BETWEEN 1 AND 9 
OR LEN(@text) = 10 AND @text <= '2147483647'
-- ...

不会出现以下情况可变长度比较(例如'2'<'11'),因此请不要解决该问题。

There won't be cases of variable-length comparisons (e.g. '2' < '11'), so please do not address that issue.

推荐答案

SQL Server归类不保证任何有关编码的信息。它们是从字符的二进制表示形式到常用字符的映射。

SQL Server collations do not guarantee anything about the encodings. They are mappings from binary representations of characters to the commonly understood characters.

为此,我认为相关概念是代码页:

For this purpose, I think the relevant concept is "code page":


代码页

Code Page

代码页是给定脚本的有序字符集,其中
a数字索引或代码点值与每个
字符相关联。 Windows代码页通常称为字符
集或字符集。代码页用于为
字符集和不同Windows
系统区域设置所使用的键盘布局提供支持。所有Windows Server 2008 Unicode归类都是基于Unicode
5.0的。

A code page is an ordered set of characters of a given script in which a numeric index, or code point value, is associated with each character. A Windows code page is commonly referred to as a character set or charset. Code pages are used to provide support for the character sets and keyboard layouts that are used by different Windows system locales. All Windows Server 2008 Unicode collations are Unicode 5.0-based.

有序部分是感兴趣的部分这个问题。排序确定字符是否按数字自然排序。

The "ordered" piece is the part of interest for this problem. The ordering determines whether the characters are "naturally" ordered for numbers.

因此,归类的概念并不要求数字自然排序。

也就是说,我不知道数字'0'-'9'自然排序的。在Unicode标准中,数字是自然排序的。我想不出有人会创建这种归类的原因。因此,实际上,如果存在这样的归类,我会感到非常惊讶。而且,如果这样做的话,它可能不兼容Unicode,因此在SQL Server中将不可用。

That said, I am not aware of any collation anywhere where the digits '0' - '9' are not ordered naturally. In the Unicode standard, numbers are ordered naturally. I cannot think of a reason why anyone would create such a collation. So, in practice, I would be very, very surprised if such a collation existed. And, if it did, it would probably not be Unicode-compliant and so would not be available in SQL Server.

实际上,不存在这种排序规则,或者至少没有这种排序规则这种排序规则得到了广泛使用。排序规则的定义不能保证自然排序,但是自然排序是Unicode字符集的一部分。假定所有排序规则都具有自然的数字顺序是很安全的,但是从理论上讲,可以用数字的自然顺序来创建字符集。

In practice, no such collation exists or at least no such collation is in wide-spread use. Natural ordering is not guaranteed by the definition of a collation, but is part of the Unicode character sets. You are pretty safe in assuming that all collations have natural ordering of numbers, but it is theoretically possible to create a character set with non-natural ordering of digits.

这篇关于所有归类中包含相同长度整数的字符串的数字排序顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆