你将如何得到的Unicode代码点从.NET字符串数组? [英] How would you get an array of Unicode code points from a .NET String?
问题描述
我有我需要核对串字符范围限制列表,但字符
键入.NET是UTF-16,因此一些字符变得古怪(代理)对,而不是。因此,枚举所有的在字符串
,我没有得到32位的Unicode代码点的时候并具有高值的一些比较失败。字符
I have a list of character range restrictions that I need to check a string against, but the char
type in .NET is UTF-16 and therefore some characters become wacky (surrogate) pairs instead. Thus when enumerating all the char
's in a string
, I don't get the 32-bit Unicode code points and some comparisons with high values fail.
我理解Unicode的不够好,如果有必要,我可以解析字节自己,但我正在寻找一个C#/。NET框架BCL解决方案。所以...
I understand Unicode well enough that I could parse the bytes myself if necessary, but I'm looking for a C#/.NET Framework BCL solution. So ...
你会如何转换字符串
到一个数组( INT [ ]
)的32位Unicode代码点?
How would you convert a string
to an array (int[]
) of 32-bit Unicode code points?
推荐答案
这个答案是不正确的。见@ Virtlink的答案正确的。
static int[] ExtractScalars(string s)
{
if (!s.IsNormalized())
{
s = s.Normalize();
}
List<int> chars = new List<int>((s.Length * 3) / 2);
var ee = StringInfo.GetTextElementEnumerator(s);
while (ee.MoveNext())
{
string e = ee.GetTextElement();
chars.Add(char.ConvertToUtf32(e, 0));
}
return chars.ToArray();
}
备注:标准化来处理复合字符。
Notes: Normalization is required to deal with composite characters.
这篇关于你将如何得到的Unicode代码点从.NET字符串数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!