规范化串是不一样的ToCharArray [英] Normalized string is not the same as ToCharArray

查看:163
本文介绍了规范化串是不一样的ToCharArray的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

s2为归一化的S1
作为字符串S1和S2出现同样
S1和S2有不同的GetHash code
的String.Compare表示s 1和s为等同

S2作为一个字符串有口音
s2.ToCharArray删除口音

为什么s2.ToCharArray不同,S2作为一个字符串?

我想它了。
S2的长度是4。
该口音是刚刚剥离出来作为一个单独的字符(的Int16 = 769)。
的String.Compare是足够聪明,看着办吧。

有趣的是那个数字的String.Compare出来,但String.Contains没有。

 字符串S1 =XXE;
字符串s1copy =XXE;
字符串s2 = s1.Normalize(NormalizationForm.FormD);
字符串S2B =XXE;
焦炭口音='E';

的Debug.WriteLine(S1); // XXE
的Debug.WriteLine(S2)​​; // XXE
的Debug.WriteLine(S2B); // XXE

的Debug.WriteLine(s1.GetHash code()); // 424384421
的Debug.WriteLine(s1copy.GetHash code()); // 424384421
的Debug.WriteLine(s2.GetHash code()); // 1057341801
的Debug.WriteLine(s2b.​​GetHash code()); // 1701495145

的Debug.WriteLine(s1.Contains(音)); // 真正
的Debug.WriteLine(s2.Contains(音)); // 假
的Debug.WriteLine(s2b.​​Contains(音)); // 假

的Debug.WriteLine(的String.Compare(S1,s1copy)的ToString()); // 0
的Debug.WriteLine(的String.Compare(S1,S2)的ToString()); // 0
的Debug.WriteLine(的String.Compare(S1,S2B)的ToString()); // 1
的Debug.WriteLine(的String.Compare(S2,S2B)的ToString()); // 1

的Debug.WriteLine(s1.Equals(s1copy)); // 真正
的Debug.WriteLine(s1.Equals(S2)); // 假
的Debug.WriteLine(s1.Equals(S2B)); // 假
的Debug.WriteLine(s2.Equals(S2B)); // 假

的Debug.WriteLine(S1 == s1copy); // 真正
的Debug.WriteLine(S1 == S2); // 假
的Debug.WriteLine(S1 == S2B); // 假
的Debug.WriteLine(S2 == S2B); // 假

的char [] chars1 = s1.ToCharArray();
的char [] chars2 = s2.ToCharArray();
的char [] chars2b = s2b.ToCharArray();
的Debug.WriteLine(chars1.Length.ToString()); // 3
的Debug.WriteLine(chars2.Length.ToString()); // 4
的Debug.WriteLine(chars2b.Length.ToString()); // 3
的Debug.WriteLine(chars1 [0]的ToString()++((Int16类型)chars1 [0])。的ToString()++ chars1 [1]的ToString()++((Int16类型)chars1 [1])的ToString()++ chars1 [2]的ToString()++((Int16类型)chars1 [2])的ToString())。
//×120×120é233
的Debug.WriteLine(chars2 [0]的ToString()++((Int16类型)chars2 [0])。的ToString()++ chars2 [1]的ToString()++((Int16类型)chars2 [1])。的ToString()++ chars2 [2]的ToString()++((Int16类型)chars2 [2])。的ToString()++ chars2 [3]的ToString()+ +((Int16类型)chars2 [3])的ToString());
//×120×120ë101 769
的Debug.WriteLine(chars2b [0]的ToString()++((Int16类型)chars2b [0])。的ToString()++ chars2b [1]的ToString()++((Int16类型)chars2b [1])的ToString()++ chars2b [2]的ToString()++((Int16类型)chars2b [2])的ToString())。
//×120×120ë101
的Debug.WriteLine(chars1.GetHash code()); // 16098066
的Debug.WriteLine(chars2.GetHash code()); // 53324351
的Debug.WriteLine(chars2b.GetHash code()); // 50785559
的Debug.WriteLine(chars1 == chars2); // 假
的Debug.WriteLine(chars1 == chars2b); // 假
的Debug.WriteLine(chars2 == chars2b); // 假
 

解决方案
  

为什么s2.ToCharArray不同,S2作为一个字符串?

这是因为的 NormalizationForm 您选择。它会分解 XXE X X 电子邮件`

NormalizationForm.FormD

  

指示使用完整规范,一个统一code字符串进行标准化   分解。

如果这仍然是不清楚,这里是统一code成分

  

在统一的背景下code,字符组成的过程   更换基本字母后跟一个或多个code点   字符组合成一个单一的precomposed字符;和   字符分解是相反的过程。

从本质上讲,你分解字符串来最低的形式,也就是你看到的四个不同的角色。

也许这将是,如果你尝试重新组合的char []

更清晰

  VAR s2Compare =新的字符串(chars2)
VAR ISEQ =(s2Compare == S2)//真
 

s2 is a normalized s1
as string s1 and s2 appear the same
s1 and s2 have a different GetHashCode
String.Compare shows s1 and s2 as equivalent

s2 as a string has the accent
s2.ToCharArray removes the accent

Why is s2.ToCharArray different from s2 as a string?

I figured it out.
The length of s2 is 4.
The accent is just stripped out as a separate char (Int16 = 769).
String.Compare is smart enough figure it out.

What is interesting is that String.Compare figures it out but String.Contains does not.

string s1 = "xxé";
string s1copy = "xxé";
string s2 = s1.Normalize(NormalizationForm.FormD);
string s2b = "xxe";
char accent = 'é';

Debug.WriteLine(s1);  // xxé
Debug.WriteLine(s2);  // xxé
Debug.WriteLine(s2b); // xxe

Debug.WriteLine(s1.GetHashCode());      // 424384421
Debug.WriteLine(s1copy.GetHashCode());  // 424384421
Debug.WriteLine(s2.GetHashCode());      // 1057341801
Debug.WriteLine(s2b.GetHashCode());     // 1701495145

Debug.WriteLine(s1.Contains(accent));   // true
Debug.WriteLine(s2.Contains(accent));   // false
Debug.WriteLine(s2b.Contains(accent));  // false

Debug.WriteLine(string.Compare(s1, s1copy).ToString());  // 0
Debug.WriteLine(string.Compare(s1, s2).ToString());      // 0
Debug.WriteLine(string.Compare(s1, s2b).ToString());     // 1
Debug.WriteLine(string.Compare(s2, s2b).ToString());     // 1

Debug.WriteLine(s1.Equals(s1copy));  // true
Debug.WriteLine(s1.Equals(s2));      // false
Debug.WriteLine(s1.Equals(s2b));     // false
Debug.WriteLine(s2.Equals(s2b));     // false

Debug.WriteLine(s1 == s1copy);  // true
Debug.WriteLine(s1 == s2);      // false
Debug.WriteLine(s1 == s2b);     // false
Debug.WriteLine(s2 == s2b);     // false

char[] chars1  = s1.ToCharArray();
char[] chars2  = s2.ToCharArray();
char[] chars2b = s2b.ToCharArray();
Debug.WriteLine(chars1.Length.ToString());  // 3
Debug.WriteLine(chars2.Length.ToString());  // 4
Debug.WriteLine(chars2b.Length.ToString()); // 3
Debug.WriteLine(chars1[0].ToString() + " "  + ((Int16)chars1[0]).ToString() + " "  + chars1[1].ToString() + " "  + ((Int16)chars1[1]).ToString() + " "  + chars1[2].ToString() + " "  + ((Int16)chars1[2]).ToString());
// x 120 x 120 é 233
Debug.WriteLine(chars2[0].ToString() + " " + ((Int16)chars2[0]).ToString() + " " + chars2[1].ToString() + " " + ((Int16)chars2[1]).ToString() + " " + chars2[2].ToString() + " " + ((Int16)chars2[2]).ToString() +" " + chars2[3].ToString() + " " + ((Int16)chars2[3]).ToString());  
//x 120 x 120 e 101 ́ 769
Debug.WriteLine(chars2b[0].ToString() + " " + ((Int16)chars2b[0]).ToString() + " " + chars2b[1].ToString() + " " + ((Int16)chars2b[1]).ToString() + " " + chars2b[2].ToString() + " " + ((Int16)chars2b[2]).ToString()); 
//x 120 x 120 e 101
Debug.WriteLine(chars1.GetHashCode());   // 16098066
Debug.WriteLine(chars2.GetHashCode());   // 53324351
Debug.WriteLine(chars2b.GetHashCode());  // 50785559
Debug.WriteLine(chars1 == chars2);  // false
Debug.WriteLine(chars1 == chars2b); // false
Debug.WriteLine(chars2 == chars2b); // false

解决方案

Why is s2.ToCharArray different from s2 as a string?

This occurs because of the NormalizationForm you have chosen. It will decompose xxé to x, x, e, and `

NormalizationForm.FormD:

Indicates that a Unicode string is normalized using full canonical decomposition.

If this still is unclear, here is a definition of Unicode Composition

In the context of Unicode, character composition is the process of replacing the code points of a base letter followed by one or more combining characters into a single precomposed character; and character decomposition is the opposite process.

Essentially, you're decomposing the string to its lowest form, which is the four different characters you're seeing.

Maybe it will be more clear if you try recombining the char[]

var s2Compare = new string(chars2)
var isEq = (s2Compare == s2) //true

这篇关于规范化串是不一样的ToCharArray的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆