Unicode中缺少某些阿拉伯字符的表示形式(字形) [英] Missing presentation forms (glyphs) of some arabic characters in Unicode

查看:103
本文介绍了Unicode中缺少某些阿拉伯字符的表示形式(字形)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一个生成包含阿拉伯文本的PDF的代码.对于每个字符,我在演示文稿表单中选择正确的字形以正确显示文本.这可以正常工作,但Unicode不包含所有阿拉伯字符的表示形式.例如\ u067D阿拉伯字母THH,向下有三个点ٽ.即使该字符具有中间形式,也没有该字符的表示形式,如在以下字符串中所示:لٽط

I am working on a code that generates PDF containing arabic texts. For each character, I am choosing the correct glyph in the presentation forms to display the text correctly. This works fine but Unicode doesn't contain presentation form of all arabic characters. For example \u067D ARABIC LETTER TEH WITH THREE DOTS ABOVE DOWNWARDS ٽ. There is no presentation form of this character even though the character has medial form, as can be seen in this string: لٽط

缺少此字符和其他字符的表示形式的原因是什么?在实践中不使用该字符吗?可以改用上面仅包含一个点并具有演示文稿形式的简单阿拉伯字母小组吗?还是有必要以某种方式构建此字符(例如,使用\ uFBB6上方三个点)?

What is the reason that presentation forms of this and other characters are missing? Is the character not used in practice? Can the simple ARABIC LETTER TEH, which contains only one dot above and has presentation forms, be used instead? Or is it necessary to somehow build this character (e.g. by using \uFBB6 THREE DOTS ABOVE character)?

推荐答案

阿拉伯文的演示文稿格式绝对不要用于编写文本.之所以存在它们,仅是因为它们很早以前就需要与旧标准兼容.因此,没有针对Unicode的所有阿拉伯字母的展示形式,只有针对此特定目的的形式.在介绍表格完全不再相关之后很长时间也添加了许多字母.有关更多信息,请参见阿拉伯语常见问题解答.

The Arabic presentation forms should never be used for writing text. They exist only because they were needed for compatibility with older standards long ago. As such, there aren’t presentation forms for all Arabic letters in Unicode, only those necessary for this specific purpose. Many letters were also added long after the presentation forms ceased being relevant altogether. See the FAQ on Arabic for more information.

应始终使用常规字母(从 Arabic Arabic Supplement Arabic块)输入和存储始终阿拉伯文字扩展A ).然后,这些字母将根据它们在单词中的位置(首字母,中间字母或结尾字母)自动采用正确的形状,如您提供的示例字符串中所示.

Arabic text should always be entered and stored using the regular letters (from the blocks Arabic, Arabic Supplement, and Arabic Extended-A). These letters will then automatically assume the correct shape depending on where they are situated in the word (initial, medial, or final) as can be seen in the example string you provided.

在此情况下,不适合使用字符U + FBB6 AB阿拉伯符号三个点,因为这不是组合标记.它不是用来构建新字符的,而是用于单独讨论符号本身的.从 阿拉伯文演示表格-A :

Using the character U+FBB6 ﮶ ARABIC SYMBOL THREE DOTS ABOVE would not be appropriate in this context because it is not a combining mark. It isn’t used to build new characters, but to talk about the symbol itself in isolation. From the code chart for Arabic Presentation Forms-A:

这些是代表阿拉伯字母变音符号的空格符号孤立地考虑,例如关于阿拉伯文字.

These are spacing symbols representing Arabic letter diacritics considered in isolation, as for example as in discussions about the Arabic script.

如果您使用的软件不能正确处理阿拉伯字母连接,则根本没有Unicode定义的方式在文档中输入media的中间形式.您要么必须完全切换到另一个框架,要么(作为最后的手段)将所需的上下文形式编码为使用新字体的专用字符,但强烈建议您不要使用该解决方案.

If the software you are using does not handle Arabic letter joining correctly, then there simply is no Unicode-defined way to enter the medial form of ٽ in your document. You will either have to switch to another framework entirely, or (as a last resort) encode the contextual forms you need as private-use characters in a new font, but I strongly recommend against that solution.

这篇关于Unicode中缺少某些阿拉伯字符的表示形式(字形)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆