为什么使用大小写敏感排序规则时,范围上的过滤匹配错误的大小写? [英] Why does filtering on a range match the wrong case when using a Case Sensitive collation?

查看:204
本文介绍了为什么使用大小写敏感排序规则时,范围上的过滤匹配错误的大小写?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SQL Server标准64位与归类 SQL_Latin1_General_CP1_CS_AS



plz

  ort varchar(30)SQL_Latin1_General_CP1_CS_AS 





 选择ort,
从plz
其中
ort > ='zürich'
and ort <='zürichz'

选择此数据:

 Zürich
ZürichMülligen
Zürich80

在第二个zürich结束时没有 z 没有选择数据。

解决方案

这里有两个混乱:




  • Equality与排序



    使用等于运算符<$区分大小写的问题是更直接和明显的(例如z≠Z)。但是当使用> < 操作符排序是必需的,敏感性vs不区分大小写。排序首先由哪种类型的排序(见下一点)和确定,可能由区分大小写确定。


  • 字典顺序与二进制(即ASCII值/代码点)顺序



    字典顺序,默认,意味着将使用特定的语言字母顺序。特定语言是归类名称的一部分,例如 Latin1_General Danish 。这将在a(ASCII = 97)和A(ASCII = 65)和Z(ASCII = 90)之后。二进制排序规则(即以 _BIN _BIN2 结尾的)将使用二进制字符值, 在A和a之间。二进制排序必然是区分大小写的,因为它是一切敏感的。字典排序会将A和a组合在一起,但是只有在排序规则也区分大小写时,A才会在a之前。




p>现在让我们把这一切都回到这个问题。当前排序规则是 SQL_Latin1_General_CP1_CS_AS ,这是一个字典排序。当查看字典时,苏黎世苏黎世会被发现背靠背; Zürich只会因为资金 Z 而不会出现在 zany 之前。



查看查询返回的结果,第一个条目,苏黎世不应该在那里。我不能得到回报。由于这是区分大小写的整理,苏黎世确实在zürich之前,并且会被 WHERE 。但其他两个值,ZürichMülligenZürich80 有效返回,因为它们匹配 WHERE 。如果您希望它们不匹配,则您需要使用二进制排序规则,而不是区分大小写的字典。



要查看此操作,请运行以下:



字典(不区分大小写):

  SELECT tmp.val 
FROM(SELECT'bat'
UNION ALL
SELECT'bar'
UNION ALL
SELECT' bad'
UNION ALL
SELECT'Bar')tmp(val)
ORDER BY tmp.val COLLATE SQL_Latin1_General_CP1_CI_AS ASC;

结果:


不良



栏 - 通常(但不一定)是栏后面的栏

bat


字典(区分大小写)

  SELECT tmp.val 
FROM(SELECT'bat'
UNION ALL
SELECT'bar'
UNION ALL
SELECT'bad'
UNION ALL
SELECT'Bar')tmp(val)
ORDER BY tmp.val COLLATE SQL_Latin1_General_CP1_CS_AS ASC;

结果:




酒吧 - 酒吧必然在酒吧之前,但不在不好或b之前

酒吧

bat


二进制:

  SELECT tmp.val 
FROM(SELECT'bat'
UNION ALL
SELECT'bar'
UNION ALL
SELECT'bad'
UNION ALL
SELECT'Bar')tmp(val)
ORDER BY tmp.val COLLATE Latin1_General_BIN2 ASC;

结果:


value = 66必须在b(值= 98)之前

bad

bar

bat


< blockquote>




要更好地了解大小写是否会影响过滤和排序,请运行以下分步示例

  DECLARE @test TABLE(string VARCHAR(30)
COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL );
INSERT INTO @test(string)values('A');
INSERT INTO @test(string)values('B');
INSERT INTO @test(string)values('Y');
INSERT INTO @test(string)values('Z');
INSERT INTO @test(string)values('a');
INSERT INTO @test(string)values('b');
INSERT INTO @test(string)values('y');
INSERT INTO @test(string)values('z');

- 区分大小写,字典有序排序通过列排序规则
SELECT [string] AS [Test1] FROM @test ORDER BY string ASC;
/ * - 每个字母的大写和小写组合在一起;
- 大写始终是第一个由于区分大小写
A
a
B
b
Y
y
Z
z
* /

- 通过列排序规则区分大小写的比较
SELECT [string] AS [Test2] FROM @test WHERE string ='Y';
/ * - 使用区分大小写比较时平等是直接的
Y
* /

- 区分大小写,字典排序比较和排序列排序规则
SELECT [string] AS [Test3] FROM @test WHERE string> ='Y'ORDER BY string ASC;
/ * - 大写首先由于区分大小写
Y $ b $由
Z
z
* /

- 区分大小写,字典顺序比较和排序通过列排序
SELECT [string] AS [Test4] FROM @test WHERE string> ='y'ORDER BY string ASC;
/ * - 大写首先由于区分大小写
y
Z
z
* /

- 不区分大小写,字典顺序比较通过collat​​ion override
SELECT [string] AS [Test5] FROM @test
WHERE string> 'Y'COLLATE SQL_Latin1_General_CP1_CI_AS;
/ * - 大写字母和小写字母由于不区分大小写而相同
Z
z
* /

- 二进制排序通过排序复制
SELECT [string] AS [Test6] FROM @test ORDER BY string COLLATE Latin1_General_BIN ASC;
/ *
A
B
Y
Z
a
b
y
z
* /

- 区分大小写,字典排序比较通过列排序;
- 通过排序复制的二进制排序
SELECT [string] AS [Test7] FROM @test WHERE string> ='y'
ORDER BY string COLLATE Latin1_General_BIN ASC;
/ * - 小写字母'y'在使用字典比较时位于'Z'和'z'之前
Z
y
z
* /

- 通过排序规则重写的二进制比较;
- 通过排序复制的二进制排序
SELECT [string] AS [Test8] FROM @test WHERE string> ='y'COLLATE Latin1_General_BIN
ORDER BY string COLLATE Latin1_General_BIN ASC;
/ * - 小写字母'y'在使用二进制比较时位于所有大写之后
y
z
* /

- 不区分大小写,字典有序排序通过排序复制
SELECT [string] AS [Test9] FROM @test
ORDER BY string COLLATE SQL_Latin1_General_CP1_CI_AS ASC;
/ * - 每个字母的大写字母和小写字母组合在一起,
- 但大写字母与小写字母前缀不一致
A
a
b
B
Y
y
z
Z
* /

有关SQL Server归类信息,请参阅以下链接:




SQL Server Standard 64 Bit with collation SQL_Latin1_General_CP1_CS_AS

Table plz:

ort varchar(30)    SQL_Latin1_General_CP1_CS_AS

select  ort,
from plz
where
   ort >= 'zürich' 
   and ort <= 'zürichz'

Selects this data:

Zürich
Zürich Mülligen
Zürich 80

Without the z at the end of second zürich no data are selected which is ok. But why does it show data on case sensitive server?

解决方案

There are two confusions happening here:

  • Equality vs Sorting

    When using the equality operator = the issue of case-sensitivity is more straight-forward and obvious (e.g. "z" <> "Z"). But when using the > and < operators sorting is required and this is separate from, though influenced by, case-sensitive vs case-insensitive. Sorting is determined first by which type of ordering (see next point) and then possibly determined by case-sensitivity.

  • Dictionary order vs Binary (i.e. ASCII value / Code Point) order

    Dictionary ordering, the default, means that a particular languages alphabetical ordering will be used. The particular language is part of the collation name, such as Latin1_General or Danish. This will put "a" (ASCII = 97) and "A" (ASCII = 65) together and "Z" (ASCII = 90) after both. A binary collation (i.e. one ending in _BIN or _BIN2) will use the binary character value, which will put "Z" between "A" and "a". A binary ordering is necessarily case-sensitive as it is everything-sensitive. A dictionary ordering will group "A" and "a" together, but it will only enforce that "A" comes before "a" if the collation is also case-sensitive. A case-insensitive dictionary ordering can intermix "A" and "a" values (see final SELECT in the example).

Now let's tie this all back to the question. The current collation is SQL_Latin1_General_CP1_CS_AS which is a Dictionary ordering. And when looking in a dictionary, both Zürich and zürich are going to be found back-to-back; Zürich would not come before zany simply because it has a capital Z.

Looking at the results returned by your query, the first entry, Zürich should not be there. I cannot get that to be returned. Since this is a case-sensitive collation, Zürich does come before zürich and would be filtered out by the WHERE clause. But the other two values, Zürich Mülligen and Zürich 80 are valid to be returned as they do match the WHERE clause. If you want them to not match, then you need to use a binary collation instead of the case-sensitive dictionary one.

To see this in action, run the following:

Dictionary (Case-InSensitive):

SELECT tmp.val
FROM (SELECT 'bat'
      UNION ALL
      SELECT 'bar'
      UNION ALL
      SELECT 'bad'
      UNION ALL
      SELECT 'Bar') tmp(val)
ORDER BY tmp.val COLLATE SQL_Latin1_General_CP1_CI_AS ASC;

Results:

bad
bar
Bar -- "Bar" typically, but not necessarily, comes after "bar"
bat

Dictionary (Case-Sensitive):

SELECT tmp.val
FROM (SELECT 'bat'
      UNION ALL
      SELECT 'bar'
      UNION ALL
      SELECT 'bad'
      UNION ALL
      SELECT 'Bar') tmp(val)
ORDER BY tmp.val COLLATE SQL_Latin1_General_CP1_CS_AS ASC;

Results:

bad
Bar -- "Bar" necessarily comes before "bar", but not before "bad, or even "b"
bar
bat

Binary:

SELECT tmp.val
FROM (SELECT 'bat'
      UNION ALL
      SELECT 'bar'
      UNION ALL
      SELECT 'bad'
      UNION ALL
      SELECT 'Bar') tmp(val)
ORDER BY tmp.val COLLATE Latin1_General_BIN2 ASC;

Results:

Bar -- "Bar" ("B" value = 66) necessarily comes before "b" (value = 98)
bad
bar
bat


To get a better sense of how case-sensitivity affects filtering and sorting in general, run the following step-by-step example.

DECLARE @test TABLE (string VARCHAR(30)
                         COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL);
INSERT INTO @test (string) values ('A');
INSERT INTO @test (string) values ('B');
INSERT INTO @test (string) values ('Y');
INSERT INTO @test (string) values ('Z');
INSERT INTO @test (string) values ('a');
INSERT INTO @test (string) values ('b');
INSERT INTO @test (string) values ('y');
INSERT INTO @test (string) values ('z');

-- case-sensitive, dictionary ordered sorting via column collation
SELECT [string] AS [Test1] FROM @test ORDER BY string ASC;
/* -- upper-case and lower-case of each letter are grouped together;
   -- upper-case consistently comes first due to case-sensitive
A
a
B
b
Y
y
Z
z
*/

-- case-sensitive comparison via column collation
SELECT [string] AS [Test2] FROM @test WHERE string = 'Y';
/* -- equality is straight-forward when using case-sensitive comparison
Y
*/

-- case-sensitive, dictionary ordered comparison and sorting via column collation
SELECT [string] AS [Test3] FROM @test WHERE string >= 'Y' ORDER BY string ASC;
/*  -- upper-case comes first due to case-sensitive
Y
y
Z
z
*/

-- case-sensitive, dictionary ordered comparison and sorting via column collation
SELECT [string] AS [Test4] FROM @test WHERE string >= 'y' ORDER BY string ASC;
/* -- upper-case comes first due to case-sensitive
y
Z
z
*/

-- case-insensitive, dictionary ordered comparison via collation override
SELECT [string] AS [Test5] FROM @test
WHERE string > 'Y' COLLATE SQL_Latin1_General_CP1_CI_AS;
/* -- upper-case and lower-case are the same due to case-INsensitive
Z
z
*/

-- binary ordering via collation override
SELECT [string] AS [Test6] FROM @test ORDER BY string COLLATE Latin1_General_BIN ASC;
/*
A
B
Y
Z
a
b
y
z
*/

-- case-sensitive, dictionary ordered comparison via column collation;
-- binary ordering via collation override
SELECT [string] AS [Test7] FROM @test WHERE string >= 'y'
   ORDER BY string COLLATE Latin1_General_BIN ASC;
/* -- lower-case 'y' comes before both 'Z' and 'z' when using a dictionary comparison
Z
y
z
*/

-- binary comparison via collation override;
-- binary ordering via collation override
SELECT [string] AS [Test8] FROM @test WHERE string >= 'y' COLLATE Latin1_General_BIN
   ORDER BY string COLLATE Latin1_General_BIN ASC;
/* -- lower-case 'y' comes after all capitals when using a binary comparison
y
z
*/

-- case-insensitive, dictionary ordered sorting via collation override
SELECT [string] AS [Test9] FROM @test
ORDER BY string COLLATE SQL_Latin1_General_CP1_CI_AS ASC;
/* -- upper-case and lower-case of each letter are grouped together,
   -- but inconsistent for upper-case vs lower-case first
A
a
b
B
Y
y
z
Z
*/

See the following links for some SQL Server collation info:

这篇关于为什么使用大小写敏感排序规则时,范围上的过滤匹配错误的大小写?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆