“合并字符”与“合并字符”之间的区别是什么?和“修饰语字母”? [英] What is the difference between "combining characters" and "modifier letters"?

查看:71
本文介绍了“合并字符”与“合并字符”之间的区别是什么?和“修饰语字母”?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Unicode标准中,有变音标记,例如U + 0302,COMBINING CIRCUMFLEX ACCENT(◌̂)和U + 02C6,MODIFIER LETTER CIRCUMFLEX ACCENT(ˆ)。我知道组合字符会与前一个字母组合在一起,比如说制作一个像ô的字母,但是修饰词是做什么用的呢?仅仅是组合字符的可打印表示形式,如果是这样,与普通的U + 005E CIRCUMFLEX ACCENT(^)有什么不同?



[I'我对回旋符本身不感兴趣,而是对这类字符(似乎有很多这样的字符,如您所见,



这里是另一个示例。运行这个琐碎的Java代码...

  System.out.println( Base character:\u0930); 
System.out.println(带有组合字符的基础:\u0930\u0903\u0951);

....产生以下输出结果:





在这种情况下,输出的宽度大于基本字符;



我提供了两个示例作为屏幕截图,原因是组合字符中的一个放置在基本字符的上方,另一个放置在基本字符的右侧。可能很难找到一种字体来正确呈现产生的字形。



修饰字母



与组合字符相比,修饰字母是独立的。尽管它们通常还会修改另一个字符(通常但不一定是前一个字符),但它们本身是基本字符,并且在视觉上是不同的。以您的示例为例,这是Java应用程序的输出,其中输出基本字符 a ,后跟U + 0302,COMBINING CIRCUMFLEX ACCENT(◌̂)和U + 02C6,修改字母大写的重音符号(ˆ):

  A 0302:Â

A 02C6:Aˆ

修饰符字母圆圈重音显示在 A

回旋符作为修饰字母的实际含义(语义)是上下文驱动的。例如,在法语中,côté o 上的抑扬音会影响其发音,而<$ sûr中的c $ c> u 不会;取而代之的是,它用于从视觉上区分sûr(意思是 sure )和发音相同的sur(意思是 on )。在法语中,对 o 的抑音符总是会影响发音,而对 u 的抑扬符则永远不会。


这仅仅是
组合字符的可打印表示形式...


否-修改字母带有含义。如上所述,在法语抑扬符的情况下,可以根据其修饰的字母来根据上下文来驱动含义。但是含义可以包含在修饰字母本身内。 例如



修饰语字母通常用在技术语音转录系统中,在那里它们会增加使用组合标记来进行语音区分的功能。其中一些也已被改编成普通语言的拼字法。例如,U + 02BB修饰符字母转换逗号用于表示夏威夷语拼字法中的'okina(声门止动)。



该示例还显示了修饰字母不需要与任何其他字符关联。



还请注意,修饰符不一定是字母


与普通的U + 005E有何不同?


,CIRCUMFLEX ACCENT(^)?


这只是用来表示抑扬符号的字符。与组合字符和修饰符不同,它不能与任何其他字符在语义或视觉上关联。



请参见有关Unicode®标准版本11.0 –核心规范的更多详细信息:




  • 7.8修饰符

  • 7.9组合标记


In the Unicode standard, there are diacritical marks, such as U+0302, COMBINING CIRCUMFLEX ACCENT (◌̂), and U+02C6, MODIFIER LETTER CIRCUMFLEX ACCENT (ˆ). I know that combining characters are combined with the previous letter to, say, make a letter like "ô", but what are modifier letters used for? Is it just a printable representation of the combining character, and if so, how is that different from the plain U+005E, CIRCUMFLEX ACCENT (^)?

[I'm not interested int the circumflex itself, but rather this class of characters (there seem to be many of them, as you can see here).]

解决方案

What is the difference between "combining characters" and "modifier letters"?

Combining characters

Combining characters are always applied against a preceding base character. Here is an example taken from section 5.13 Rendering Nonspacing Marks of The Unicode Standard Version 11.0 – Core Specification where a sequence of four combining characters are applied to the base character a:

Here's another example. Running this trivial Java code...

System.out.println("Base character:                 \u0930");
System.out.println("Base with combining characters: \u0930\u0903\u0951");

....yielded this output:

In this case the output was wider than the base character; one of the combining characters was placed above the base character, and the other was placed to the right of the base character.

I've provided both examples as screen shots because it can be difficult to find a font to render the resulting glyphs correctly.

Modifying Letters

In contrast to combining characters, modifying letters are freestanding. While they also usually modify another character (normally but not necessarily the preceding character) they are base characters themselves, and visually distinct. To use your example, here is the output of from a Java application printing the base character a followed by U+0302, COMBINING CIRCUMFLEX ACCENT (◌̂) and U+02C6, MODIFIER LETTER CIRCUMFLEX ACCENT (ˆ) respectively:

A 0302: Â

A 02C6: Aˆ

The MODIFIER LETTER CIRCUMFLEX ACCENT is rendered to the right of the A whereas the COMBINING CIRCUMFLEX ACCENT is rendered above it.

The actual meaning (semantics) of the circumflex character as a modifying letter is context driven. For example, in French, the circumflex on the o in côté affects its pronunciation, but the circumflex on the u in sûr does not; instead it is used to visually distinguish sûr (meaning sure) from the identically pronounced sur (meaning on). In French a circumflex on o always affects pronunciation, and on u it never does.

Is it just a printable representation of the combining character...

No - the modifying letter carries meaning. In the case of the French circumflex that meaning may be context driven based on the letter it modifies, as described above. But the meaning can be contained within the modifying letter itself. For example:

Modifier letters are commonly used in technical phonetic transcriptional systems, where they augment the use of combining marks to make phonetic distinctions. Some of them have been adapted into regular language orthographies as well. For example, U+02BB MODIFIER LETTER TURNED COMMA is used to represent the 'okina (glottal stop) in the orthography for Hawaiian.

That example also shows that a modifying letter need not be associated with any other character. That is never the case with combining characters.

Also note that a modifier letter is not necessarily a letter in any alphabet, and the majority of modifier letters are actually symbols (e.g. the circumflex).

How is that different from the plain U+005E, CIRCUMFLEX ACCENT (^)?

That is simply the character used to represent a circumflex accent. Unlike combining characters and modifier letters, it cannot be semantically or visually associated with any other character.

See the following sections in The Unicode® Standard Version 11.0 – Core Specification for lots more detail:

  • 7.8 Modifier Letters
  • 7.9 Combining Marks

这篇关于“合并字符”与“合并字符”之间的区别是什么?和“修饰语字母”?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆