转换的Uni code字符串UNI code字符在C#中的印度语 [英] Converting Unicode string to unicode chars in c# for indian languages
问题描述
我要UNI code字符串转换为单code字。
I need to convert unicode string to unicode characters.
有关,例如:语言泰米尔语
for eg:Language Tamil
கமலி=>க','ம','லி
我能够剥离UNI code字节,但生产单code字是成了问题。
i'm able to strip unicode bytes but producing unicode characters is became problem.
byte[] stringBytes = Encoding.Unicode.GetBytes("கமலி");
char[] stringChars = Encoding.Unicode.GetChars(stringBytes);
foreach (var crt in stringChars)
{
Trace.WriteLine(crt);
}
它给出的结果:
it gives result as :
க'=> 0 x0b95
'க'=>0x0b95
ம'=> 0 x0bae
'ம'=>0x0bae
ல'=> 0 x0bb2
'ல'=>0x0bb2
ி'=> 0 x0bbf
'ி'=>0x0bbf
所以这里的问题是如何剥离字符'லி',因为它为'லி斗而不破像 ல,ி。的
so here the problem is how to strip character 'லி' as it as 'லி' without splitting like 'ல','ி'.
,因为这是很自然的印第安语通过重新presenting辅音元音作为单个字符,但与C#解析做出困难。
since it is natural in Indian language by representing consonant and vowel as single characters but parsing with c# make difficulty.
我需要被分成3个字符。
All i need to be split into 3 characters.
推荐答案
要遍历字形可以使用的<一个方法href="http://msdn.microsoft.com/en-us/library/system.globalization.stringinfo.aspx"><$c$c>StringInfo$c$c>类。
To iterate over graphemes you can use the methods of the StringInfo
class.
基本字符+组合字符的每个组合被称为文本元素由.NET文档,你可以对他们使用遍历一个<一个href="http://msdn.microsoft.com/en-us/library/system.globalization.textelementenumerator.aspx"><$c$c>TextElementEnumerator$c$c>:
Each combination of base character + combining characters is called a 'text element' by the .NET documentation, and you can iterate over them using a TextElementEnumerator
:
var str = "கமலி";
var enumerator = System.Globalization.StringInfo.GetTextElementEnumerator(str);
while (enumerator.MoveNext())
{
Console.WriteLine(enumerator.Current);
}
输出:
க
ம
லி
这篇关于转换的Uni code字符串UNI code字符在C#中的印度语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!