Vim:Utf-8ې字符中断显示的字符串 [英] Vim: Utf-8 ې character breaks displayed string
问题描述
我有一个包含十六进制内容的文件:db90 3031 46,应在vim中显示为ې",后跟"01F",但我注意到它从未正确显示.然后我注意到在其他地方也一样,例如在终端机和浏览器中,我总是得到ې01 F?这是为什么?只需将其粘贴到google中并尝试一下,您将永远无法将ې"和0用作下一个字符.
I have file that has hex content: db90 3031 46, which should be displayed in vim as "ې" followed by "01F", but what I noticed is that it is never displayed correctly. Then I noticed It is the same in other places like in terminal and browser I always get ې01F? Why is that? Just paste that in google and try yourself you will never be able to put "ې" and 0 as next character.
推荐答案
这是带有从右至左指示符的阿拉伯字符,因此您可能需要切换回从左至右的模式,例如U+200e
.
That's an Arabic character with right-to-left indicator, so you probably need to switch back to left-to-right mode, such as with U+200e
.
Unicode双向内容相当复杂-您看到的行为可能是由于以下事实造成的:拉丁数字标记为EN = European number
(弱类型),而诸如F
之类的字母标记为L = left to right
(强类型).
The Unicode bidirectional stuff is rather complex - the behaviour you are seeing is probably caused by the fact that the Latin digits are marked EN = European number
(a weak type), while letters such as F
are marked L = left to right
(a strong type).
弱类型在Unicode规范中的处理方式有所不同,例如此引号涵盖了您的特殊情况(我强调):
Weak types are treated differently in the Unicode specification, such as with this quote which covers your particular case (my emphasis):
当从右到左的段落以从左到右的字符开头,或者存在不同方向的文本的嵌套段,或者在方向边界上存在弱的字符时,可能会出现问题. 在这种情况下,可能需要嵌入或方向标记才能正确显示.
Problematic cases may occur when a right-to-left paragraph begins with left-to-right characters, or there are nested segments of different-direction text, or there are weak characters on directional boundaries. In these cases, embeddings or directional marks may be required to get the right display.
因此,您的代码点后跟一个数字,将呈现为ې7"(我在阿拉伯字符的之后中键入了7
,尽管事实上它出现在了它的前面),而在其后跟着一个字母给出ېX".
So your code point followed by a digit renders as "ې7" (I typed that 7
in after the Arabic character despite the fact it's showing up before it), while following it with a letter gives "ېX".
在这里,通过在两个字符之间插入‎
来生成文本ې 7",HTML相当于U+200e
Unicode代码点.
For what it's worth, the text "ې7" was generated here by inserting ‎
between the two characters, the HTML equivalent of the U+200e
Unicode code point.
如果您直接转到此UTF-8编解码器站点,然后在解码部分输入%u06D0%u200e7
,您会看到它按您想要的顺序显示(删除%200e
会按照您在问题中描述的顺序显示它).
If you head on over to this UTF-8 codec site and enter %u06D0%u200e7
into the decoding section, you'll see that it comes out in your desired order (removing the %200e
shows it in the order you're describing in your question).
这篇关于Vim:Utf-8ې字符中断显示的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!