将印地语字母转换为其unicode等效 [英] To Convert A Hindi letter to its unicode Equivalent

查看:145
本文介绍了将印地语字母转换为其unicode等效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将在文本框中键入的印地语单词转换为其等效的Unicode字符并将其显示在另一个文本框中..

对于Ex:

त应该转换到'0924'

क - 0915等等。

How to Convert a Hindi word typed in a Textbox to its equivalent Unicode Character and display it on another Text box..
For Ex:
त Should be converted to '0924'
क - 0915 and so on.

推荐答案

没有印地语字母这样的东西。现代印地语使用देवनागरीDevanāgarī,因此它只能是Devanāgarī信:

http:/ /www.unicode.org/charts/PDF/U0900.pdf [ ^ ],

https://en.wikipedia.org/wiki /梵文 [ ^ ]。



但是,即使请求Devanāgarī的解决方案也是反编程方法,因为任何.NET字符都是Unicode字符,并且所有Unicode字符都有它们的Unicode代码点。您不应将这些代码点与任何与计算机相关的表示形式混合,这些表示形式标准化为UTF。代码点是抽象的数学整数值。但是如果你考虑每个UTF,你应该明白只有UTF-32用代表与代码点相同的数字代表字符。



因此,你可以使用的功能是 System.Char.ConvertToUtf32 https://msdn.microsoft.com/en-us/library/z2ys180b%28v=vs.110%29.aspx [ ^ ]。



实际上,本文档页面中的注释告诉您获得代码点。



导致误解的原因是:为什么第一个参数是字符串,不是一个角色?答案就是这样:遗憾的是,并非所有.NET char 值都是Unicode字符。一些Unicode字符(每个单个代码点)由两个 char 值,代理项对表示。只有BMP以上的代码点才需要这样的对,所以你不必为Devanāgarī担心它们。第一个函数参数应该是BMP中只有一个字符的字符串,或者是两个字符串,表示单个真正的Unicode字符。这种丑陋是.NET角色内部表示的结果,它基于UTF-16LE。



参见:

https://en.wikipedia.org/wiki/Basic_Multilingual_Plane [ ^ ],

https://en.wikipedia.org/wiki/Surrogate_pair [ ^ ],

http://unicode.org/faq/utf_bom.html [ ^ ]。



-SA
There is no such thing as "Hindi letter". Modern Hindi uses देवनागरी Devanāgarī, so it can be only a Devanāgarī letter:
http://www.unicode.org/charts/PDF/U0900.pdf[^],
https://en.wikipedia.org/wiki/Devanagari[^].

But even requesting the solution for Devanāgarī would be anti-programming approach, because any .NET character is a Unicode character, and all Unicode characters have their Unicode code points by definition. You should not mix those code points with any computer-related representation, which are standardized as UTFs. Code points are abstract mathematical integer values. But if you consider each UTF, you should understand that only UTF-32 represent characters by numbers representing the same values as code points.

Therefore, the function you can use is System.Char.ConvertToUtf32: https://msdn.microsoft.com/en-us/library/z2ys180b%28v=vs.110%29.aspx[^].

Actually, the comment in this documentation page tells you that you get a code point.

What can cause misunderstanding is: why first argument is a string, not a character? Here is the answer: unfortunately, not all .NET char values are really Unicode characters. Some Unicode characters (a single code point each) are represented by two char values, surrogate pairs. Such pairs are needed only for code points above BMP, so you don't have to worry about them with Devanāgarī. First function argument should be either a string of only one character from BMP, or a string of two, representing a single "real Unicode" characters. This ugliness is the consequence of internal representation on .NET characters, which is based on UTF-16LE.

See also:
https://en.wikipedia.org/wiki/Basic_Multilingual_Plane[^],
https://en.wikipedia.org/wiki/Surrogate_pair[^],
http://unicode.org/faq/utf_bom.html[^].

—SA


这篇关于将印地语字母转换为其unicode等效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆