使用的字符编码是什么? [英] What's the character encoding used?
问题描述
奇字符代码:
ก็็็็็็็็็็็็็็็็็็็็กิิิิิิิิิิิิิิิิิิิิก้้้้้้้้้้้้้้้้้้้้ก็็็็็็็็็็็็็็็็็็็็กิิิิิิิิิิิิิิิิิิิิก้้้้้้้้้้้้้้้้้้้้ก็็็็็็็็็็็็็็็็็็็็กิิิิิิิิิิิิิิิิิิิิก้้้้้้้้้้้้้้้้้้้้ก็็็็็็็็็็็็็็็็็็็็กิิิิิิิิิิิิิิิิิิิิก้้้้้้้้
问题: 这些字符的编码是什么?
(提示:尝试编辑此问题,您会看到为什么它们是奇怪的,LIVE)
, 那就对了。你看到我做的同样的事情。
显然,这来自一个mac。所以,有了我的主题的一点知识,我解雇notepad ++并试图以十六进制查看它。
结果?自己尝试: ...
轻松搜索gnome charmap:
U + 0E01 THAI CHARACTER KO KAI
一般字符属性
在Unicode中自:1.1
Unicode类别:Letter,其他
各种有用的表示
UTF-8: 0xE0 0xB8 0x81
UTF-16:0x0E01
八进制转义UTF-8:\340\270\201
XML小数实体:ก
后跟(一个或多个/的变体):
<$ c $ p>
U + 0E47 THAI CHARACTER MAITAIKHU
一般字符属性
在Unicode中:1.1
Unicode类别:Mark,非间距
各种有用的表示形式
UTF-8:0xE0 0xB9 0x87
UTF-16:0x0E47
$ b b C八进制转义UTF-8:\340\271\207
XML十进制实体:&#3655;
注释和交叉引用
别名:
•mai taikhu
第二个是装饰第一个char
的非间距标记
Odd character codes:
ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้
Question: What's the encoding of these characters?
(Tip: Try editing this question and you'll see why they're odd, LIVE)
Yeah, that's right. You see the same thing I do.
Apparently, this came from a mac. So, with the little knowledge of the subject I have, I fired up notepad++ and tried to view it in hex.
The result? Try it yourself: http://notepad-plus-plus.org/
Fairly obvious; What the hell?
I can understand if it is Just a Bunch of Bits
in some weird proprietary binary encoding (containing stuff like color, font, etc. etc.). But why do they show up so strange?
Also, why do notepad++ not show the original characters from the beginning? If you turn on the hex-editor and then turn it off, it's like it expands.
(Also (again), try copy-pasting the above characters twice into notepad++. See the difference? Nothing but 0x3f
and the occasional 0x20
. This is also true for each individual character. As far as I know, neither a space nor a question-mark looks like the above characters. But oh, I may be wrong of course..)
Here's a snippet from outlook:
EDIT: Editing these characters using UTF-8
instead of stupid ANSI
actually lets you see the correct bytes.
EDIT 2: I probably should have been more clear in what I wanted to know when I wrote the question (in my defence, I was so grossed out I just wanted to scream BRAINOVERFLOW
when I saw it [the screenshot]).
EDIT 3: (copied from yahoo answer) It appears to be a thing called "stacking diacritics" using Thai characters.
Essentially the Thai character ก "ko kai" can have any of several superscripted diacritic marks such as ็ "maitaikhu". If you follow "ko kai" with "maitaikhu", the latter appears as a superscript thus: ก็
If you put further diacritics after such a combination, they'll stack thus: ก็็็็็
Here are the characters that will do it: http://graphemica.com/search?q=%E0%B8%81…
Easy search on gnome charmap:
U+0E01 THAI CHARACTER KO KAI
General Character Properties
In Unicode since: 1.1
Unicode category: Letter, Other
Various Useful Representations
UTF-8: 0xE0 0xB8 0x81
UTF-16: 0x0E01
C octal escaped UTF-8: \340\270\201
XML decimal entity: ก
followed by (one or more of / a variation of):
U+0E47 THAI CHARACTER MAITAIKHU
General Character Properties
In Unicode since: 1.1
Unicode category: Mark, Non-Spacing
Various Useful Representations
UTF-8: 0xE0 0xB9 0x87
UTF-16: 0x0E47
C octal escaped UTF-8: \340\271\207
XML decimal entity: ็
Annotations and Cross References
Alias names:
• mai taikhu
The second is a non-spacing mark decorating the first char
这篇关于使用的字符编码是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!