Unicode字符导致SQL Server 2005字符串比较中的问题 [英] Unicode characters causing issues in SQL Server 2005 string comparison

查看:71
本文介绍了Unicode字符导致SQL Server 2005字符串比较中的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此查询:

select *
from op.tag
where tag = 'fussball'

返回结果的标签列值为fußball。 标签列定义为nvarchar(150)。

Returns a result which has a tag column value of "fußball". Column "tag" is defined as nvarchar(150).

虽然我了解它们是类似的词在语法上,有人可以解释和捍卫这种行为吗?我认为它与相同的排序规则设置有关,该设置允许您更改列/表的区分大小写,但是谁会想要这种行为?列上的唯一约束还会由于违反约束而导致一个值插入而另一个值存在时导致失败。如何关闭此功能?

While I understand they are similar words grammatically, can anyone explain and defend this behavior? I assume it is related to the same collation settings which allow you to change case sensitivity on a column/table, but who would want this behavior? A unique constraint on the column also causes failure on inserts of one value when the other exists due to a constraint violation. How do I turn this off?

后续奖励积分问题。解释为什么此查询不返回任何行:

Follow-up bonus point question. Explain why this query does not return any rows:

select 1 
where 'fußball' = 'fussball'

奖金问题(答案?):@ScottCher私下向我指出,这是由于字符串文字足球被视为varchar。该查询返回结果:

Bonus question (answer?): @ScottCher pointed out to me privately that this is due to the string literal "fussball" being treated as a varchar. This query DOES return a result:

select 1 
where 'fußball' = cast('fussball' as nvarchar)

但是再说一次,这不是:

But then again, this one does not:

select 1 
where cast('fußball' as varchar) = cast('fussball' as varchar)

我很困惑。

推荐答案

我猜是Unicode排序规则为连接/表/数据库设置的值指定ss ==ß。后一种行为是因为它位于错误的快速路径上,或者可能是进行了二进制比较,或者您可能没有以正确的编码传递ß(我同意这很愚蠢)。

I guess the Unicode collation set for your connection/table/database specifies that ss == ß. The latter behavior would be because it's on a faulty fast path, or maybe it does a binary comparison, or maybe you're not passing in the ß in the right encoding (I agree it's stupid).

http://unicode.org/reports/tr10/#Searching提到U + 00DF是特殊情况。以下是一个很有见地的摘录:

http://unicode.org/reports/tr10/#Searching mentions that U+00DF is special-cased. Here's an insightful excerpt:


语言敏感搜索和
匹配与
排序规则密切相关。比较
在某个强度级别上相等的字符串是在进行
语言敏感匹配时应匹配的
。以
为例,根据UCA中的
,在主要强度下,ß
将与 ss相匹配,而在丹麦
中, aa将与å相匹配

这篇关于Unicode字符导致SQL Server 2005字符串比较中的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆