转换的Uni code字符串UNI code字符在C#中的印度语 [英] Converting Unicode string to unicode chars in c# for indian languages

查看:252
本文介绍了转换的Uni code字符串UNI code字符在C#中的印度语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要UNI code字符串转换为单code字。

I need to convert unicode string to unicode characters.

有关,例如:语言泰米尔语

for eg:Language Tamil

கமலி=>க','ம','லி

我能够剥离UNI code字节,但生产单code字是成了问题。

i'm able to strip unicode bytes but producing unicode characters is became problem.

byte[] stringBytes = Encoding.Unicode.GetBytes("கமலி");
char[] stringChars = Encoding.Unicode.GetChars(stringBytes);
foreach (var crt in stringChars)
 {
     Trace.WriteLine(crt);
 }

它给出​​的结果:

it gives result as :

க'=> 0 x0b95

'க'=>0x0b95

ம'=> 0 x0bae

'ம'=>0x0bae

ல'=> 0 x0bb2

'ல'=>0x0bb2

ி'=> 0 x0bbf

'ி'=>0x0bbf

所以这里的问题是如何剥离字符'லி',因为它为'லி斗而不破像 ல,ி。

so here the problem is how to strip character 'லி' as it as 'லி' without splitting like 'ல','ி'.

,因为这是很自然的印第安语通过重新presenting辅音元音作为单个字符,但与C#解析做出困难。

since it is natural in Indian language by representing consonant and vowel as single characters but parsing with c# make difficulty.

我需要被分成3个字符。

All i need to be split into 3 characters.

推荐答案

要遍历字形可以使用的<一个方法href="http://msdn.microsoft.com/en-us/library/system.globalization.stringinfo.aspx"><$c$c>StringInfo类。

To iterate over graphemes you can use the methods of the StringInfo class.

基本字符+组合字符的每个组合被称为文本元素由.NET文档,你可以对他们使用遍历一个<一个href="http://msdn.microsoft.com/en-us/library/system.globalization.textelementenumerator.aspx"><$c$c>TextElementEnumerator:

Each combination of base character + combining characters is called a 'text element' by the .NET documentation, and you can iterate over them using a TextElementEnumerator:

var str = "கமலி";
var enumerator = System.Globalization.StringInfo.GetTextElementEnumerator(str);
while (enumerator.MoveNext())
{
    Console.WriteLine(enumerator.Current);
}

输出:

க
ம
லி

这篇关于转换的Uni code字符串UNI code字符在C#中的印度语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆