保证字符串不变时,字符串比较是否会因文化而异? [英] Could string comparisons really differ based on culture when the string is guaranteed not to change?
问题描述
我正在从配置文件中读取加密的凭据/连接字符串。 Resharper告诉我,此行上的 String.IndexOf(string)在这里是特定于文化的:
I'm reading encrypted credentials/connection strings from a config file. Resharper tells me, "String.IndexOf(string) is culture-specific here" on this line:
if (line.Contains("host=")) {
_host = line.Substring(line.IndexOf(
"host=") + "host=".Length, line.Length - "host=".Length);
...因此要将其更改为:
...and so wants to change it to:
if (line.Contains("host=")) {
_host = line.Substring(line.IndexOf("host=", System.StringComparison.Ordinal) + "host=".Length, line.Length - "host=".Length);
无论应用程序可能部署在哪里,我正在读取的值始终为 host = 。添加此 System.StringComparison.Ordinal位真的明智吗?
The value I'm reading will always be "host=" regardless of where the app may be deployed. Is it really sensible to add this "System.StringComparison.Ordinal" bit?
更重要的是,它会伤害任何东西(使用它)吗?
More importantly, could it hurt anything (to use it)?
推荐答案
绝对。每个MSDN( http://msdn.microsoft.com/en-us/library/ d93tkzah.aspx ),
Absolutely. Per MSDN (http://msdn.microsoft.com/en-us/library/d93tkzah.aspx),
此方法执行一个单词(区分大小写的和区分文化的 >)
使用当前区域性进行搜索。
This method performs a word (case-sensitive and culture-sensitive) search using the current culture.
因此,如果您在不同区域性下运行它,可能会得到不同的结果(通过控制面板中的区域和语言设置)。
So you may get different results if you run it under a different culture (via regional and language settings in Control Panel).
在这种情况下,您可能不会遇到问题,但会抛出 i
在搜索字符串中运行并在土耳其运行,这可能会破坏您的一天。
In this particular case, you probably won't have a problem, but throw an i
in the search string and run it in Turkey and it will probably ruin your day.
请参见MSDN: http://msdn.microsoft.com/en-us/library/ms973919.aspx
这些新的建议和API的存在是为了减轻有关默认字符串API行为的误导性假设。出现
的错误的典型示例是 Turkish-I问题,该错误中的非语言字符串数据是用语言解释的
。
These new recommendations and APIs exist to alleviate misguided assumptions about the behavior of default string APIs. The canonical example of bugs emerging where non-linguistic string data is interpreted linguistically is the "Turkish-I" problem.
几乎所有拉丁字母,包括美国英语,字符
i(\u0069)是字符I(\u0049)的小写版本。这种
大小写规则很快成为使用
这种文化进行编程的人的默认设置。但是,在土耳其语( tr-TR)中,存在大写字母
i带点(,u0130),它是
i的大写形式。同样,在土耳其语中,有一个小写的 i无点或
(\u0131),将大写为I。在Azeri
文化( az)中也发生这种情况
For nearly all Latin alphabets, including U.S. English, the character i (\u0069) is the lowercase version of the character I (\u0049). This casing rule quickly becomes the default for someone programming in such a culture. However, in Turkish ("tr-TR"), there exists a capital "i with a dot," character (\u0130), which is the capital version of i. Similarly, in Turkish, there is a lowercase "i without a dot," or (\u0131), which capitalizes to I. This behavior occurs in the Azeri culture ("az") as well.
因此,通常关于资本化i或小写字母
的假设在所有文化中均无效。如果使用默认的
重载用于字符串比较例程,则它们将是
,这取决于区域性之间的差异。对于非语言数据,如以下示例中的
一样,这会产生不希望的结果:
Therefore, assumptions normally made about capitalizing i or lowercasing I are not valid among all cultures. If the default overloads for string comparison routines are used, they will be subject to variance between cultures. For non-linguistic data, as in the following example, this can produce undesired results:
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US")
Console.WriteLine("Culture = {0}",
Thread.CurrentThread.CurrentCulture.DisplayName);
Console.WriteLine("(file == FILE) = {0}",
(String.Compare("file", "FILE", true) == 0));
Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");
Console.WriteLine("Culture = {0}",
Thread.CurrentThread.CurrentCulture.DisplayName);
Console.WriteLine("(file == FILE) = {0}",
(String.Compare("file", "FILE", true) == 0));
由于比较I的差异,$的结果当线程区域性更改时,b $ b比较也会更改。这是
的输出:
Because of the difference of the comparison of I, results of the comparisons change when the thread culture is changed. This is the output:
Culture = English (United States)
(file == FILE) = True
Culture = Turkish (Turkey)
(file == FILE) = False
这里是一个没有大小写的示例:
Here is an example without case:
var s1 = "é"; //é as one character (ALT+0233)
var s2 = "é"; //'e', plus combining acute accent U+301 (two characters)
Console.WriteLine(s1.IndexOf(s2, StringComparison.Ordinal)); //-1
Console.WriteLine(s1.IndexOf(s2, StringComparison.InvariantCulture)); //0
Console.WriteLine(s1.IndexOf(s2, StringComparison.CurrentCulture)); //0
这篇关于保证字符串不变时,字符串比较是否会因文化而异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!