规范化串是不一样的ToCharArray [英] Normalized string is not the same as ToCharArray
问题描述
s2为归一化的S1
作为字符串S1和S2出现同样
S1和S2有不同的GetHash code
的String.Compare表示s 1和s为等同
S2作为一个字符串有口音
s2.ToCharArray删除口音
为什么s2.ToCharArray不同,S2作为一个字符串?
我想它了。
S2的长度是4。
该口音是刚刚剥离出来作为一个单独的字符(的Int16 = 769)。
的String.Compare是足够聪明,看着办吧。
有趣的是那个数字的String.Compare出来,但String.Contains没有。
字符串S1 =XXE;
字符串s1copy =XXE;
字符串s2 = s1.Normalize(NormalizationForm.FormD);
字符串S2B =XXE;
焦炭口音='E';
的Debug.WriteLine(S1); // XXE
的Debug.WriteLine(S2); // XXE
的Debug.WriteLine(S2B); // XXE
的Debug.WriteLine(s1.GetHash code()); // 424384421
的Debug.WriteLine(s1copy.GetHash code()); // 424384421
的Debug.WriteLine(s2.GetHash code()); // 1057341801
的Debug.WriteLine(s2b.GetHash code()); // 1701495145
的Debug.WriteLine(s1.Contains(音)); // 真正
的Debug.WriteLine(s2.Contains(音)); // 假
的Debug.WriteLine(s2b.Contains(音)); // 假
的Debug.WriteLine(的String.Compare(S1,s1copy)的ToString()); // 0
的Debug.WriteLine(的String.Compare(S1,S2)的ToString()); // 0
的Debug.WriteLine(的String.Compare(S1,S2B)的ToString()); // 1
的Debug.WriteLine(的String.Compare(S2,S2B)的ToString()); // 1
的Debug.WriteLine(s1.Equals(s1copy)); // 真正
的Debug.WriteLine(s1.Equals(S2)); // 假
的Debug.WriteLine(s1.Equals(S2B)); // 假
的Debug.WriteLine(s2.Equals(S2B)); // 假
的Debug.WriteLine(S1 == s1copy); // 真正
的Debug.WriteLine(S1 == S2); // 假
的Debug.WriteLine(S1 == S2B); // 假
的Debug.WriteLine(S2 == S2B); // 假
的char [] chars1 = s1.ToCharArray();
的char [] chars2 = s2.ToCharArray();
的char [] chars2b = s2b.ToCharArray();
的Debug.WriteLine(chars1.Length.ToString()); // 3
的Debug.WriteLine(chars2.Length.ToString()); // 4
的Debug.WriteLine(chars2b.Length.ToString()); // 3
的Debug.WriteLine(chars1 [0]的ToString()++((Int16类型)chars1 [0])。的ToString()++ chars1 [1]的ToString()++((Int16类型)chars1 [1])的ToString()++ chars1 [2]的ToString()++((Int16类型)chars1 [2])的ToString())。
//×120×120é233
的Debug.WriteLine(chars2 [0]的ToString()++((Int16类型)chars2 [0])。的ToString()++ chars2 [1]的ToString()++((Int16类型)chars2 [1])。的ToString()++ chars2 [2]的ToString()++((Int16类型)chars2 [2])。的ToString()++ chars2 [3]的ToString()+ +((Int16类型)chars2 [3])的ToString());
//×120×120ë101 769
的Debug.WriteLine(chars2b [0]的ToString()++((Int16类型)chars2b [0])。的ToString()++ chars2b [1]的ToString()++((Int16类型)chars2b [1])的ToString()++ chars2b [2]的ToString()++((Int16类型)chars2b [2])的ToString())。
//×120×120ë101
的Debug.WriteLine(chars1.GetHash code()); // 16098066
的Debug.WriteLine(chars2.GetHash code()); // 53324351
的Debug.WriteLine(chars2b.GetHash code()); // 50785559
的Debug.WriteLine(chars1 == chars2); // 假
的Debug.WriteLine(chars1 == chars2b); // 假
的Debug.WriteLine(chars2 == chars2b); // 假
为什么s2.ToCharArray不同,S2作为一个字符串?
这是因为的 NormalizationForm
您选择。它会分解 XXE
为 X X ,电子邮件和`
指示使用完整规范,一个统一code字符串进行标准化 分解。
如果这仍然是不清楚,这里是统一code成分
在统一的背景下code,字符组成的过程 更换基本字母后跟一个或多个code点 字符组合成一个单一的precomposed字符;和 字符分解是相反的过程。
从本质上讲,你分解字符串来最低的形式,也就是你看到的四个不同的角色。
也许这将是,如果你尝试重新组合的char []
VAR s2Compare =新的字符串(chars2)
VAR ISEQ =(s2Compare == S2)//真
s2 is a normalized s1
as string s1 and s2 appear the same
s1 and s2 have a different GetHashCode
String.Compare shows s1 and s2 as equivalent
s2 as a string has the accent
s2.ToCharArray removes the accent
Why is s2.ToCharArray different from s2 as a string?
I figured it out.
The length of s2 is 4.
The accent is just stripped out as a separate char (Int16 = 769).
String.Compare is smart enough figure it out.
What is interesting is that String.Compare figures it out but String.Contains does not.
string s1 = "xxé";
string s1copy = "xxé";
string s2 = s1.Normalize(NormalizationForm.FormD);
string s2b = "xxe";
char accent = 'é';
Debug.WriteLine(s1); // xxé
Debug.WriteLine(s2); // xxé
Debug.WriteLine(s2b); // xxe
Debug.WriteLine(s1.GetHashCode()); // 424384421
Debug.WriteLine(s1copy.GetHashCode()); // 424384421
Debug.WriteLine(s2.GetHashCode()); // 1057341801
Debug.WriteLine(s2b.GetHashCode()); // 1701495145
Debug.WriteLine(s1.Contains(accent)); // true
Debug.WriteLine(s2.Contains(accent)); // false
Debug.WriteLine(s2b.Contains(accent)); // false
Debug.WriteLine(string.Compare(s1, s1copy).ToString()); // 0
Debug.WriteLine(string.Compare(s1, s2).ToString()); // 0
Debug.WriteLine(string.Compare(s1, s2b).ToString()); // 1
Debug.WriteLine(string.Compare(s2, s2b).ToString()); // 1
Debug.WriteLine(s1.Equals(s1copy)); // true
Debug.WriteLine(s1.Equals(s2)); // false
Debug.WriteLine(s1.Equals(s2b)); // false
Debug.WriteLine(s2.Equals(s2b)); // false
Debug.WriteLine(s1 == s1copy); // true
Debug.WriteLine(s1 == s2); // false
Debug.WriteLine(s1 == s2b); // false
Debug.WriteLine(s2 == s2b); // false
char[] chars1 = s1.ToCharArray();
char[] chars2 = s2.ToCharArray();
char[] chars2b = s2b.ToCharArray();
Debug.WriteLine(chars1.Length.ToString()); // 3
Debug.WriteLine(chars2.Length.ToString()); // 4
Debug.WriteLine(chars2b.Length.ToString()); // 3
Debug.WriteLine(chars1[0].ToString() + " " + ((Int16)chars1[0]).ToString() + " " + chars1[1].ToString() + " " + ((Int16)chars1[1]).ToString() + " " + chars1[2].ToString() + " " + ((Int16)chars1[2]).ToString());
// x 120 x 120 é 233
Debug.WriteLine(chars2[0].ToString() + " " + ((Int16)chars2[0]).ToString() + " " + chars2[1].ToString() + " " + ((Int16)chars2[1]).ToString() + " " + chars2[2].ToString() + " " + ((Int16)chars2[2]).ToString() +" " + chars2[3].ToString() + " " + ((Int16)chars2[3]).ToString());
//x 120 x 120 e 101 ́ 769
Debug.WriteLine(chars2b[0].ToString() + " " + ((Int16)chars2b[0]).ToString() + " " + chars2b[1].ToString() + " " + ((Int16)chars2b[1]).ToString() + " " + chars2b[2].ToString() + " " + ((Int16)chars2b[2]).ToString());
//x 120 x 120 e 101
Debug.WriteLine(chars1.GetHashCode()); // 16098066
Debug.WriteLine(chars2.GetHashCode()); // 53324351
Debug.WriteLine(chars2b.GetHashCode()); // 50785559
Debug.WriteLine(chars1 == chars2); // false
Debug.WriteLine(chars1 == chars2b); // false
Debug.WriteLine(chars2 == chars2b); // false
Why is s2.ToCharArray different from s2 as a string?
This occurs because of the NormalizationForm
you have chosen. It will decompose xxé
to x, x, e, and `
Indicates that a Unicode string is normalized using full canonical decomposition.
If this still is unclear, here is a definition of Unicode Composition
In the context of Unicode, character composition is the process of replacing the code points of a base letter followed by one or more combining characters into a single precomposed character; and character decomposition is the opposite process.
Essentially, you're decomposing the string to its lowest form, which is the four different characters you're seeing.
Maybe it will be more clear if you try recombining the char[]
var s2Compare = new string(chars2)
var isEq = (s2Compare == s2) //true
这篇关于规范化串是不一样的ToCharArray的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!