获取阿拉伯语的初始中间最终字符unicode号 [英] get arabic initial medial final characters unicode number

查看:79
本文介绍了获取阿拉伯语的初始中间最终字符unicode号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨.
我想在C#中获得阿拉伯语或波斯语字符的unicode号
简单为=فارسي"可以得到此答案(例如Windows字符映射表)
ufed3 ufe8e ufead ufeb3 ufef2 (我想要这个)
阿拉伯字母Feh的初始形式,阿拉伯字母Alef的最终形式,阿拉伯字母reh的孤立形式,阿拉伯字母看到的初始形式,阿拉伯字母Yeh的最终形式
我想从该文本中获取unicode代码,
C#为فارسي"找到此代码
u0641 u0627 u0631 u0633 u064a (我不要这个)
对于此代码

hi.
i want get arabic or persian character unicode number in C#
simple as = "فارسي" get me this answer (such as windows Character Map)
ufed3 ufe8e ufead ufeb3 ufef2 (i want this)
arabic letter Feh initial form , arabic letter Alef Final form , arabic letter reh isolated form,arabic letter seen initial form,arabic letter Yeh Final form
i want get unicode code from that text ,
C# find this code for "فارسي"
u0641 u0627 u0631 u0633 u064a (i dont want this)
for this code

foreach (char c in text)
            {
                textBox2.Text += string.Format("u{0:x4}", (int)c);
            }


如何在初始或最终或中间字符中检测字符?
请帮助我

如果看到Windows字符映射表查找此代码,您将理解


how i can detect character in initial or final or medial ???
please help me

if see Windows Character map Find this Codes you will understand

推荐答案

因为初始/隔离/最终形式的概念仅适用于波斯语-阿拉伯语脚本(I''我不确定(也许对于某些与历史相关的脚本),它们不是通用的,所以我不知道有一种现成的方法可以做您想要的事情;我对此表示怀疑. (请参阅: http://en.wikipedia.org/wiki/Perso-Arabic_script [ ^ ].)您可以轻松地自己完成此操作.

像这样的东西:
As the notion of initial/isolated/final forms are specific just to Perso-Arabic script (I''m not sure, maybe to some historically related scripts), they are not universal, so I don''t know that there is a ready-to-use methods of doing what you want; I doubt it. (Please see: http://en.wikipedia.org/wiki/Perso-Arabic_script[^].) You can easily do it by yourself.

Something like this:
enum ArabicCharacterClass {
    Digit,
    Letter,
    Punctuation,
    Symbol, //?
    //something else?
}

enum ArabicContextualForm {
   None, //?
   End,
   Middle, 
   Beginning,
   Isolated
}

struct ArabicCharacterDescriptor {
   public ArabicCharacterDescriptor(char codePoint) {
       CodePoint = codePoint;
       //can throw exception if the code point is not Arabic (Perso-Arabic)
       //calculate the other members using the dictionaries (see below) if required
   }
   public char CodePoint { get; private set; }
   public ArabicCharacterClass CharacterClass { get; private set; }
   public ArabicContextualForm ContextualForm { get; private set; }
   public string Name { get; private set; }
   public string Din31635 { get; private set; }
   public string IPA { get; private set; }
   //something else?
   public override string ToString() {
       return //... calculate from other members:
       // whatever you want as a default ASCII string representation; could be your notation
   }
}



找出Unicode代码点的Perso-Arabic子集(有很多,请耐心等待):
http://unicode.org/ [ ^ ],
http://www.unicode.org/charts/PDF/U0750.pdf [ ^ ],
http://www.unicode.org/charts/PDF/U08A0.pdf [ ^ ],
http://www.unicode.org/charts/PDF/UFB50.pdf [ ^ ],
http://www.unicode.org/charts/PDF/UFE70.pdf [ ^ ].

遍历所有这些数据,并为每个字符创建一个ArabicCharacterDescriptor实例,对所有字符进行分类并放入一些集合.最好根据键值对将它们全部放入集合中;快速搜索.您很多人需要拥有两个或多个具有相同值的容器(例如,这些值将是ArabicCharacterDescriptor的实例),但是使用不同的键进行索引,以便通过代码点,名称,IPA或其他任何内容进行快速搜索. br/>
请参阅: http://msdn.microsoft.com/en-us/library/5tbh8a42.aspx [^ ].

基于键值对的集合为:
http://msdn.microsoft.com/en-us/library/f7fta44c.aspx [ ^ ],
http://msdn.microsoft.com/en-us/library/ms132319.aspx [ ^ ],
http://msdn.microsoft.com/en-us/library/xfhwa508.aspx [ ^ ].

最常用的一种是System.Collections.Generic.Dictionary<TKey, TValue>.

基本上就是这样.

祝你好运,

—SA



Find out the Perso-Arabic subset of the of the Unicode code points (there are many, take patience):
http://unicode.org/[^],
http://www.unicode.org/charts/PDF/U0750.pdf[^],
http://www.unicode.org/charts/PDF/U08A0.pdf[^],
http://www.unicode.org/charts/PDF/UFB50.pdf[^],
http://www.unicode.org/charts/PDF/UFE70.pdf[^].

Traverse all this data and create an instance of ArabicCharacterDescriptor of each character, classify them all and put some collection. Its the best to put then all in a collection based on key-value pair; for fast search. You many need to have two or more container with the same values (say, the values will be the instances of ArabicCharacterDescriptor), but indexed with different keys, for a fast search by code point, name, IPA or whatever else.

Please see: http://msdn.microsoft.com/en-us/library/5tbh8a42.aspx[^].

The collections based on key-value pairs are:
http://msdn.microsoft.com/en-us/library/f7fta44c.aspx[^],
http://msdn.microsoft.com/en-us/library/ms132319.aspx[^],
http://msdn.microsoft.com/en-us/library/xfhwa508.aspx[^].

The one most usually used is System.Collections.Generic.Dictionary<TKey, TValue>.

Basically, that''s it.

Good luck,

—SA


这篇关于获取阿拉伯语的初始中间最终字符unicode号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆