阿拉伯语中的unicode chr [英] unicode chr in arabic lang

查看:105
本文介绍了阿拉伯语中的unicode chr的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想找到字符串的每个字符的十六进制数

我使用两种方法(substring和arry)

这是问题所在
字符串=ههه"
在子字符串中=ه" +ه" +ه" ?????
并具有相同的十六进制数字我该怎么办才能找到char十六进制数字

i want to find hex number of each char of string

i use two method (substring and arry )

here it is the problem
string = "ههه"
in sub string = "ه" + "ه" + "ه" ?????
and have same hex number what can i do to find char hex number

推荐答案

这是U + 0647,'' ه ",阿拉伯字母"Heh".

据我所知,这是在Windows中完全实现的阿拉伯文字书写系统的诀窍,据我所知,几乎在所有现代OS中都如此:如果将一些阿拉伯字母放在一起,它们就会更改其字形以形成彼此正确的连接.看起来:" ههه ".这是相同的阿拉伯语"Heh"重复了三次,即使该字符串看起来像三个不同的字母.

现在,您可能会问:如何找到它?毕竟,您不会问我每个字符,是吗?

为此,您应该了解Unicode和UTF如何工作. Unicode就像16位编码一样,可能会误认为,这是一个标准,它定义了字符"(理解为从其字形形式抽象的文化实体)与整数(理解为抽象的数学整数)之间的一对一对应关系.数字,而不用担心它们的位大小或计算机表示形式.这些数字称为代码点".因此,核心Unicode没有定义编码.编码可以不同,并且由UTF定义.现在,此页面为UTF-8,这是一个字节码,每个字符的大小均可变.因此,如果我只是尝试以二进制形式读取您的UTF-8文本,将很难识别此代码中的代码点.在编码字和代码点之间具有一一对应关系的唯一直接编码是UTF-64(UTF-64LE或UTF-64RE).但是,我可以肯定地知道,所有阿拉伯语子集都位于BMP(基于多语言平面)中,其代码点在前17位以内.很少有16位的地方,但是阿拉伯语言对此太流行了;多余的平原保留了更多异国情调的书写系统.

因此,我将您的文本复制到文本文件中,并将其保存为UTF-16LE(在Windows术语中,它称为"Unicode文件",但实际上这是UTF-16LE),在二进制编辑器中将其打开并识别出4个相同的文本.阿拉伯语代码点U + 0647.为了弄清楚它是什么,我使用了与Windows的每个版本捆绑在一起的Windows应用程序Character Map(Charmap.EXE).它为我提供了有关阿拉伯文书写系统的子集("Unicode子范围")的信息(使用代码点U + 06XX)和有关此字符的信息.

了解它:

http://en.wikipedia.org/wiki/Unicode [ http://unicode.org/ [ ^ ];

http://en.wikipedia.org/wiki/Code_point [ Basic_Multilingual_Plane> http://en.wikipedia.org/wiki/Basic_Multilingual_Plane#Basic_Multilingual_Plane [ http://en.wikipedia.org/wiki/UTF [ http://unicode.org/faq/utf_bom.html [
This is U+0647, ''ه'', Arabic Letter "Heh".

This is the trick of Arabic writing system fully implemented in Windows and, to best of my knowledge, in nearly all modern OS: if you put some Arabic letters together, they change their glyph to form proper connections with each other. Look: "ههه". This is the same Arabic "Heh" repeated three times, even though this string looks like three different letters.

Now, you may ask: how to find it out? After all, you are not going ask me about every single character, are you?

To do this, you should understand how Unicode and UTFs work. Unicode is nothing like 16-bit encoding like may mistakenly think, rather, this is a standard defining one-to-one correspondence between "characters", understood as cultural entities abstracted from their glyph forms, and integer numbers, understood as abstract mathematical integer numbers, without any concern about their bit size or computer presentation. Those numbers are called "code points". So, the core Unicode does not define encoding. Encodings can be different and are defined by UTFs. Now, this page is in UTF-8, which is a byte code with variable size per character. So, if I simply tried to read your UTF-8 text in binary form, it would be hard to recognize the code points in this code. The only straightforward encoding with one-to-one correspondence between the encoding words and code points is UTF-64 (UTF-64LE or UTF-64RE). However, I knew for sure, that all Arabic subset lies in the BMP (Based Multilingual Plane) with code points within first 17 bits. Few more 16-bit places exists, but Arabic language is way too popular for that; the extra plains are reserved form much more exotic writing systems.

So, I copies your text in the text file and saved it as UTF-16LE (in Window jargon, it is called "Unicode files", but in fact this is UTF-16LE), opened it in the binary editors and recognized 4 identical Arabic code points U+0647. To find out what is it, I used the Windows application Character Map (Charmap.EXE) bundled with every version of Windows. It provided me with the information on the subset ("Unicode sub-range") of Arabic writing system (using code points U+06XX) and the information on this character.

Learn about it:

http://en.wikipedia.org/wiki/Unicode[^],
http://unicode.org/[^];

http://en.wikipedia.org/wiki/Code_point[^],
http://en.wikipedia.org/wiki/Basic_Multilingual_Plane#Basic_Multilingual_Plane[^],
http://en.wikipedia.org/wiki/UTF[^],
http://unicode.org/faq/utf_bom.html[^].

—SA


这篇关于阿拉伯语中的unicode chr的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆