StringComparer和Equals/==为编码的字符串产生不同的结果 [英] StringComparer and Equals/== producing different results for encoded strings
问题描述
我创建了以下代码段(对于编程课程,因此请忽略它不是特别有用):
string input ="Hello World";
byte [] data = Encoding. UTF32.GetBytes(input);
字符串垃圾= Encoding.UTF8.GetString(data);
//垃圾现在包含11 * 4 = 44个字符,其中33个是\ 0's
/此测试通过
Debug.Assert(input!=垃圾);
//我们期望比较产生相同的结果,即非零结果
//cultureCompare为0!
//但ordinalCompare是101
int ordinalCompare = StringComparer.Ordinal.Compare(input,垃圾);
这是框架错误吗?如果不是,则至少是不一致的行为,也没有记录的行为.
Morten Mertner,
我可以在一个简单的C#控制台应用程序的测试机上重现此问题. CurrentCulture和InvariantCulture属性使用当前/不变区域性的单词比较规则,但是,Ordinal属性不使用单词比较规则,因为这是非语言字符串比较.以下两篇重要的文章可以帮助您了解此问题:
1.从MSDN:新建议Microsoft .NET 2.0中使用字符串的方法
- DO:使用 StringComparison.Ordinal 或 OrdinalIgnoreCase 进行比较,作为与文化无关的字符串匹配的安全默认值.
- DO:使用 StringComparison.Ordinal 和 OrdinalIgnoreCase 比较可以提高速度.
- DO:在向用户显示输出时,请使用基于 StringComparison.CurrentCulture 的字符串操作.
- DO:根据不变文化切换当前使用的字符串操作,以使用非语言的 StringComparison.Ordinal 或 StringComparison .OrdinalIgnoreCase ,如果比较在语言上不相关(例如符号).
- DO:在标准化字符串进行比较时,请使用 ToUpperInvariant 而不是 ToLowerInvariant .
- 不要:对没有显式或隐式指定字符串比较机制的字符串操作使用重载.
- 不要:在大多数情况下,请使用基于 StringComparison.InvariantCulture 的字符串操作;少数例外之一是保留具有语言学意义但与文化无关的数据.
希望有帮助.
I created the following code snippet (for a programming course, so please ignore that it's not particularly useful):
string input = "Hello World";
byte[] data = Encoding.UTF32.GetBytes( input );
string garbage = Encoding.UTF8.GetString( data );
// garbage now contains 11*4 = 44 characters, of which 33 are \0's
// this test passes
Debug.Assert( input != garbage );
// we expect comparisons to produce the same result, that is, a non-zero result
// cultureCompare is 0!
int cultureCompare = StringComparer.CurrentCulture.Compare( input, garbage );
// invariantCompare is 0!
int invariantCompare = StringComparer.InvariantCulture.Compare( input, garbage );
// but ordinalCompare is 101
int ordinalCompare = StringComparer.Ordinal.Compare( input, garbage );
Is this a framework bug? If it isn't it's at least inconsistent and also undocumented behavior.
Morten Mertner,
I can reproduce this issue on my test machine in a simple C# console application. CurrentCulture and InvariantCulture properties using the word comparison rules of the current/invariant culture, however, the Ordinal property don't use the word comparison rules because this is a non-linguistic string comparison. The following two important articles can help you to understand this issue:
1. From MSDN: New Recommendations for Using Strings in Microsoft .NET 2.0
- DO: Use StringComparison.Ordinal or OrdinalIgnoreCase for comparisons as your safe default for culture-agnostic string matching.
- DO: Use StringComparison.Ordinal and OrdinalIgnoreCase comparisons for increased speed.
- DO: Use StringComparison.CurrentCulture-based string operations when displaying the output to the user.
- DO: Switch current use of string operations based on the invariant culture to use the non-linguistic StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase when the comparison is linguistically irrelevant (symbolic, for example).
- DO: Use ToUpperInvariant rather than ToLowerInvariant when normalizing strings for comparison.
- DON'T: Use overloads for string operations that don't explicitly or implicitly specify the string comparison mechanism.
- DON'T: Use StringComparison.InvariantCulture-based string operations in most cases; one of the few exceptions would be persisting linguistically meaningful but culturally-agnostic data.
2. From BCL Blog: String.Compare() != String.Equals() [Josh Free]
Data meaning
Data behavior
Corresponding StringComparsion
Value
· Case-sensitive internal identifiers
· Case sensitive identifiers in standards like XML and HTTP
· Case sensitive security-related settings
A non-linguistic identifier, where bytes match exactly.
Ordinal
· Case-insensitive internal identifiers
· Case-insensitive identifiers in standards like XML and HTTP
· File paths
· Registry keys/values
· Environment variables
· Resource identifiers (handle names, for example)
· Case insensitive security related settings
A non-linguistic identifier, where case is irrelevant, especially a piece of data stored in most Microsoft Windows system services.
OrdinalIgnoreCase
· Some persisted linguistically-relevant data
· Display of linguistic data requiring a fixed sort order
Culturally-agnostic data, which still is linguistically relevant.
InvariantCulture
or
InvariantCultureIgnoreCase
· Data displayed to the user
· Most user input
Data that requires local linguistic customs.
CurrentCulture
or
CurrentCultureIgnoreCase
Hope that helps.
这篇关于StringComparer和Equals/==为编码的字符串产生不同的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!