Unicode 中某些阿拉伯字符缺少表示形式(字形) [英] Missing presentation forms (glyphs) of some arabic characters in Unicode

查看:29
本文介绍了Unicode 中某些阿拉伯字符缺少表示形式(字形)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个生成包含阿拉伯语文本的 PDF 的代码.对于每个字符,我在演示表单中选择正确的字形以正确显示文本.这工作正常,但 Unicode 不包含所有阿拉伯字符的表示形式.例如 \u067D 阿拉伯字母 TEH,上面三个点向上向下 ٽ.即使该字符具有介词,也没有该字符的表现形式,如以下字符串所示: لٽط

I am working on a code that generates PDF containing arabic texts. For each character, I am choosing the correct glyph in the presentation forms to display the text correctly. This works fine but Unicode doesn't contain presentation form of all arabic characters. For example \u067D ARABIC LETTER TEH WITH THREE DOTS ABOVE DOWNWARDS ٽ. There is no presentation form of this character even though the character has medial form, as can be seen in this string: لٽط

缺少这个和其他字符的表现形式是什么原因?字符在实践中不使用吗?可以改用上面仅包含一个点且有演示表格的简单阿拉伯字母 TEH 吗?或者是否有必要以某种方式构建此字符(例如,通过使用 \uFBB6 三个点上方字符)?

What is the reason that presentation forms of this and other characters are missing? Is the character not used in practice? Can the simple ARABIC LETTER TEH, which contains only one dot above and has presentation forms, be used instead? Or is it necessary to somehow build this character (e.g. by using \uFBB6 THREE DOTS ABOVE character)?

推荐答案

阿拉伯语表示形式应该永远用于书写文本.它们存在只是因为很久以前就需要它们以与旧标准兼容.因此,没有针对 Unicode 中所有阿拉伯字母的表示形式,只有针对此特定目的所必需的表示形式.在演示表格完全不再相关很久之后,还添加了许多字母.有关详细信息,请参阅阿拉伯语常见问题解答.

The Arabic presentation forms should never be used for writing text. They exist only because they were needed for compatibility with older standards long ago. As such, there aren’t presentation forms for all Arabic letters in Unicode, only those necessary for this specific purpose. Many letters were also added long after the presentation forms ceased being relevant altogether. See the FAQ on Arabic for more information.

阿拉伯文本应该总是使用常规字母输入和存储(来自块ArabicArabic SupplementArabic扩展-A).然后,这些字母将根据它们在单词中的位置(首字母、中间字母或词尾)自动呈现正确的形状,如您提供的示例字符串所示.

Arabic text should always be entered and stored using the regular letters (from the blocks Arabic, Arabic Supplement, and Arabic Extended-A). These letters will then automatically assume the correct shape depending on where they are situated in the word (initial, medial, or final) as can be seen in the example string you provided.

使用字符 U+FBB6 ﮶ 阿拉伯符号三个点以上在这种情况下是不合适的,因为它不是组合标记.它不用于构建新字符,而是用于单独讨论符号本身.来自阿拉伯文演示表格-A:

Using the character U+FBB6 ﮶ ARABIC SYMBOL THREE DOTS ABOVE would not be appropriate in this context because it is not a combining mark. It isn’t used to build new characters, but to talk about the symbol itself in isolation. From the code chart for Arabic Presentation Forms-A:

这些是代表阿拉伯字母变音符号的空格符号孤立地考虑,例如在关于阿拉伯文字.

These are spacing symbols representing Arabic letter diacritics considered in isolation, as for example as in discussions about the Arabic script.

如果您使用的软件不能正确处理阿拉伯字母的连接,那么根本就没有 Unicode 定义的方式来在您的文档中输入 ٽ 的介词.您要么必须完全切换到另一个框架,要么(作为最后的手段)将您需要的上下文形式编码为 专用字符采用新字体,但我强烈建议不要使用该解决方案.

If the software you are using does not handle Arabic letter joining correctly, then there simply is no Unicode-defined way to enter the medial form of ٽ in your document. You will either have to switch to another framework entirely, or (as a last resort) encode the contextual forms you need as private-use characters in a new font, but I strongly recommend against that solution.

这篇关于Unicode 中某些阿拉伯字符缺少表示形式(字形)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆