什么时候应该使用StringComparison.InvariantCulture而不是StringComparison.CurrentCulture来测试字符串相等性? [英] When should I use StringComparison.InvariantCulture instead of StringComparison.CurrentCulture to test string equality?

查看:201
本文介绍了什么时候应该使用StringComparison.InvariantCulture而不是StringComparison.CurrentCulture来测试字符串相等性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基于我的理解(请参阅),为了决定是否要使用序数或文化规则来测试字符串相等性,必须采用执行的比较的语义

Based on my understanding (see my other question), in order to decide whether to test string equality by using ordinal or cultural rules, the semantic of the performed comparison must be taken into account.

如果必须将两个比较的字符串视为字符的原始序列(换句话说,是两个符号),则必须执行序数字符串比较。在服务器端代码中执行的大多数字符串比较都是这种情况。

If the two compared strings must be considered as raw sequences of characters (in other words, two symbols) then an ordinal string comparison must be performed. This is the case for most string comparisons performed in server side code.

示例:按用户名执行用户查找。在这种情况下,可用用户的用户名和搜索到的用户名只是符号,它们不是特定语言的单词,因此在比较它们时无需考虑语言元素。 在这种情况下,无论任何语言规则如何,都必须将由不同字符组成的两个符号视为不同。

Example: performing a user lookup by username. In this case the usernames of available users and the searched username are just symbols, they are not words in a specific language, so there is no need to take linguistic elements into account when comparing them. In this context two symbols composed by different characters must be considered different, regardless of any linguistic rule.

如果必须考虑两个比较的字符串作为特定语言中的单词,那么在比较过程中必须考虑文化规则。根据某种语言的语法规则,完全有可能将由不同个字符组成的两个字符串在某种语言中视为同一单词

If the two compared strings must be considerd as words in a specific language, then cultural rules must be taken into account during the comparison. It is entirely possible that two strings, composed by different characters, are considerd the same word in a certain language, based on the grammatical rules of that language.

示例:两个单词 strasse straße有与街道在德语中的含义相同。 因此,在比较表示德语单词的字符串时,必须考虑该语法规则,并且必须将这两个字符串视为相等(例如,用户输入德国市场的应用程序街道名称和该街道名称必须被搜索到数据库中,以获取该街道所在的城市)。

Example: the two words strasse and straße have the same meaning of street in the german language. So, in the context of comparing strings representing words of the german language this grammatical rule must be taken into account and these two strings must be considered equal (think of an application for the german market where the user inputs the name of a street and that street must be searched into a database, in order to get the city where the street is located).

到目前为止,很好。

鉴于所有这些,在这种情况下,将.NET 不变文化用于字符串平等有意义吗?

Given all of this, in which cases using the .NET invariant culture for strings equality makes sense ?

要点是,不变文化(与上例中提到的德国文化相对)是基于美国英语语言规则。
换句话说,没有人类语言的规则基于.NET不变文化,那么为什么我要使用这种虚拟文化来比较两个字符串呢?

The point is that the invariant culture (as opposed of the German culture, mentioned in the example above) is a fake culture based on the american english linguistic rules. Put another way, there is no human language whose rules are based on the .NET invariant culture, so why should I compare two strings by using this fictitious culture ?

我知道不变文化通常用于格式化和解析机器对机器通信场景中使用的字符串(例如,网络API)。

I know that the invariant culture is typically used to format and parse strings used in machine to machine communication scenarios (such as the contracts exposed by a web API).

我想了解使用 StringComparison.InvariantCulture string.equals 时的情况。 code>相对于 StringComparison.CurrentCulture (对于某些手动设置的线程区域性,是为了不依赖于机器操作系统配置)确实很有意义。

I would like to understand when calling string.equals using StringComparison.InvariantCulture as opposed of StringComparison.CurrentCulture (for some manually set thread culture, in order to not depend on the machine OS configuations) really makes sense.

推荐答案

结合变音符号/非规范化字符串为一个示例。请参见以下答案,以得到体面处理的代码: https://stackoverflow.com/a/31361980/2701753

Combining diacritics / non-normalised strings is one example. See this answer for a decent treatment with code: https://stackoverflow.com/a/31361980/2701753

总结(许多)字母,同一字形(字母)可能有几种潜在的Unicode(和UCS-2)表示形式。

In summary for (many) 'alphabets' there are several potential Unicode (and UCS-2) representations for the same glyph (letter)

例如:

Unicode Character "á" (U+00E1) [one unicode codepoint]
Unicode Character "a" (U+0061) [followed by] Unicode Character "◌́" (U+0301) [two unicode codepoints]

so:
á
á

相同的语言字符串(对于所有文化,它们应该代表

Same linguistic string (for all cultures, they are supposed to represent the same character) but different ordinal string (different bytes).

因此不变式比较在这种情况下就像在对字符串进行比较之前对其进行标准化

So Invariant equality comparison is [in this case] like normalising the strings before comparing them

查找Unicode规范化/分解以获取更多信息。

Look-up unicode normalisation / decomposition for more info.

还有其他有趣的情况,例如充足。然后从左到右,从右到左的标记和...。

There are other interesting cases, ligatures for example. And left to right and right to left marks and ....

因此,总而言之,一旦您使用了有趣的字母(几乎是纯ascii之外的任何东西) ),一旦您对将字符串作为语言项目/字形流进行任何形式的比较感兴趣,您可能确实想超越序数比较。

So, in summary, once you have 'interesting' alphabets in play (pretty much anything outside pure ascii), once you are interested in any sort of comparison of the strings as linguistic items / streams of glyphs, you probably do want to go beyond ordinal comparison.

直接回答问题:如果您具有多元文化的用户群,但仍需要上述语言敏感性,那么您会选择哪种文化?

To directly answer the question: If you have a multicultural user-base, but still need the above linguistic sensitivity, what culture would you choose for:


StringComparison.CurrentCulture (对于某些手动设置的线程区域性,以便不依赖于机器操作系统配置)

StringComparison.CurrentCulture (for some manually set thread culture, in order to not depend on the machine OS configuations)

other 除了InvariantCulture?

other than InvariantCulture?

这篇关于什么时候应该使用StringComparison.InvariantCulture而不是StringComparison.CurrentCulture来测试字符串相等性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆