StringComparer和Equals/==为编码的字符串产生不同的结果 [英] StringComparer and Equals/== producing different results for encoded strings

查看:176
本文介绍了StringComparer和Equals/==为编码的字符串产生不同的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了以下代码段(对于编程课程,因此请忽略它不是特别有用):

string input ="Hello World";
byte [] data = Encoding. UTF32.GetBytes(input);
字符串垃圾= Encoding.UTF8.GetString(data);
//垃圾现在包含11 * 4 = 44个字符,其中33个是\ 0's
/此测试通过
Debug.Assert(input!=垃圾);

//我们期望比较产生相同的结果,即非零结果

//cultureCompare为0!

//但ordinalCompare是101
int ordinalCompare = StringComparer.Ordinal.Compare(input,垃圾);

这是框架错误吗?如果不是,则至少是不一致的行为,也没有记录的行为.

解决方案

Morten Mertner,

我可以在一个简单的C#控制台应用程序的测试机上重现此问题. CurrentCulture和InvariantCulture属性使用当前/不变区域性的单词比较规则,但是,Ordinal属性不使用单词比较规则,因为这是非语言字符串比较.以下两篇重要的文章可以帮助您了解此问题:

1.从MSDN:新建议Microsoft .NET 2.0中使用字符串的方法

  • DO:使用 StringComparison.Ordinal OrdinalIgnoreCase 进行比较,作为与文化无关的字符串匹配的安全默认值.
  • DO:使用 StringComparison.Ordinal OrdinalIgnoreCase 比较可以提高速度.
  • DO:在向用户显示输出时,请使用基于 StringComparison.CurrentCulture 的字符串操作.
  • DO:根据不变文化切换当前使用的字符串操作,以使用非语言的 StringComparison.Ordinal StringComparison .OrdinalIgnoreCase ,如果比较在语言上不相关(例如符号).
  • DO:在标准化字符串进行比较时,请使用 ToUpperInvariant 而不是 ToLowerInvariant .
  • 不要:对没有显式或隐式指定字符串比较机制的字符串操作使用重载.
  • 不要:在大多数情况下,请使用基于 StringComparison.InvariantCulture 的字符串操作;少数例外之一是保留具有语言学意义但与文化无关的数据.

2.从BCL博客: <身体> 数据含义

数据行为

对应StringComparsion

非语言标识符,其中字节完全匹配.

序号

非语言标识符,大小写无关,尤其是存储在大多数Microsoft Windows系统服务中的一条数据.

OrdinalIgnoreCase

与文化无关的数据,它在语言上仍然相关.

不变文化

InvariantCultureIgnoreCase

需要本地数据的数据语言习俗.

CurrentCulture

CurrentCultureIgnoreCase

希望有帮助.


I created the following code snippet (for a programming course, so please ignore that it's not particularly useful):

string input = "Hello World";
byte[] data = Encoding.UTF32.GetBytes( input );
string garbage = Encoding.UTF8.GetString( data );
// garbage now contains 11*4 = 44 characters, of which 33 are \0's

// this test passes
Debug.Assert( input != garbage );

// we expect comparisons to produce the same result, that is, a non-zero result

// cultureCompare is 0!
int cultureCompare = StringComparer.CurrentCulture.Compare( input, garbage );

// invariantCompare is 0!
int invariantCompare = StringComparer.InvariantCulture.Compare( input, garbage );

// but ordinalCompare is 101
int ordinalCompare = StringComparer.Ordinal.Compare( input, garbage );

Is this a framework bug? If it isn't it's at least inconsistent and also undocumented behavior.

Morten Mertner,

 

I can reproduce this issue on my test machine in a simple C# console application. CurrentCulture and InvariantCulture properties using the word comparison rules of the current/invariant culture, however, the Ordinal property don't use the word comparison rules because this is a non-linguistic string comparison. The following two important articles can help you to understand this issue:

 

1. From MSDN: New Recommendations for Using Strings in Microsoft .NET 2.0

  • DO: Use StringComparison.Ordinal or OrdinalIgnoreCase for comparisons as your safe default for culture-agnostic string matching.
  • DO: Use StringComparison.Ordinal and OrdinalIgnoreCase comparisons for increased speed.
  • DO: Use StringComparison.CurrentCulture-based string operations when displaying the output to the user.
  • DO: Switch current use of string operations based on the invariant culture to use the non-linguistic StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase when the comparison is linguistically irrelevant (symbolic, for example).
  • DO: Use ToUpperInvariant rather than ToLowerInvariant when normalizing strings for comparison.
  • DON'T: Use overloads for string operations that don't explicitly or implicitly specify the string comparison mechanism.
  • DON'T: Use StringComparison.InvariantCulture-based string operations in most cases; one of the few exceptions would be persisting linguistically meaningful but culturally-agnostic data.

2. From BCL Blog: String.Compare() != String.Equals() [Josh Free]

 

Data meaning

Data behavior

Corresponding StringComparsion

Value

·         Case-sensitive internal identifiers

·         Case sensitive identifiers in standards like XML and HTTP

·         Case sensitive security-related settings

A non-linguistic identifier, where bytes match exactly.

Ordinal

·         Case-insensitive internal identifiers

·         Case-insensitive identifiers in standards like XML and HTTP

·         File paths

·         Registry keys/values

·         Environment variables

·         Resource identifiers (handle names, for example)

·         Case insensitive security related settings

A non-linguistic identifier, where case is irrelevant, especially a piece of data stored in most Microsoft Windows system services.

OrdinalIgnoreCase

·         Some persisted linguistically-relevant data

·         Display of linguistic data requiring a fixed sort order

Culturally-agnostic data, which still is linguistically relevant.

InvariantCulture

or

InvariantCultureIgnoreCase

·         Data displayed to the user

·         Most user input

Data that requires local linguistic customs.

CurrentCulture

or

CurrentCultureIgnoreCase

 

Hope that helps.


这篇关于StringComparer和Equals/==为编码的字符串产生不同的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆