如何转换为“组合变音标记"?在iOS上 [英] How to convert to "combining diacritical marks" on iOS

查看:171
本文介绍了如何转换为“组合变音标记"?在iOS上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的应用程序中,我要转换为完全组成的字符(例如ô"),并在其后加上修饰符变音符"(例如"oˆ",其中"ˆ"是Unicode 0x02c6). Unicode 0x00f4).我尝试使用NSString方法precomposedStringWithCanonicalMapping,但是在摸了几个小时试图弄清为什么它不起作用后,我发现它只能转换组合变音符"(

In my app, I have characters that are followed by their "modifier diacritical marks" (e.g. "oˆ", where the "ˆ" is unicode 0x02c6) that I want to convert into fully precomposed characters (e.g. "ô" - unicode 0x00f4). I tried using the NSString method precomposedStringWithCanonicalMapping, but after several hours of beating my head against the wall trying to figure out why it wasn't working, I discovered that it only converts "combining diacritical marks" (http://www.unicode.org/charts/PDF/U0300.pdf) into precomposed characters. Ok, so all I need to do is convert all of my "modifier diacritical marks" into "combining diacritical marks", then perform a precomposedStringWithCanonicalMapping on the resulting string and I'm done. This does work, but I wonder if there's a less tedious/error prone way to do this? Here's my NSString category method that seems to fix most of the characters-

- (instancetype)combineDiacritics
{
    static NSDictionary<NSNumber *, NSNumber *> *sDiacriticalSubstDict; //unichar of diacritic -> unichar of combining diacritic
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        //http://www.unicode.org/charts/PDF/U0300.pdf
        sDiacriticalSubstDict = @{ @(0x02cb) : @(0x0300), @(0x00b4) : @(0x0301), @(0x02c6) : @(0x0302), @(0x02dc) : @(0x0303), @(0x02c9) : @(0x0304),   //Grave, Acute, Circumflex, Tilde, Macron
                                   @(0x00af) : @(0x0305), @(0x02d8) : @(0x0306), @(0x02d9) : @(0x0307), @(0x00a8) : @(0x0308), @(0x02c0) : @(0x0309),   //Overline, Breve, Dot above, Diaeresis
                                   @(0x00b0) : @(0x030a), @(0x02da) : @(0x030b), @(0x02c7) : @(0x030c), @(0x02c8) : @(0x030d), @(0x02bb) : @(0x0312),   //Ring above, Double Acute, Caron, Vertical line above, Cedilla above
                                   @(0x02bc) : @(0x0313), @(0x02bd) : @(0x0314), @(0x02b2) : @(0x0321), @(0x02d4) : @(0x0323), @(0x02b1) : @(0x0324),   //Comma above, Reversed comma above, Palatalized hook below, Dot below, Diaeresis below
                                   @(0x00b8) : @(0x0327), @(0x02db) : @(0x0328), @(0x02cc) : @(0x0329), @(0x02b7) : @(0x032b), @(0x02cd) : @(0x0331),   //Cedilla, Ogonek, Vert line below, Inverted double arch below, Macron below
                                   };
    });
    NSMutableString* __block buffer = [NSMutableString stringWithCapacity:self.length];
    [self enumerateSubstringsInRange:NSMakeRange(0, self.length) options:NSStringEnumerationByComposedCharacterSequences usingBlock: ^(NSString* substring, NSRange substringRange, NSRange enclosingRange, BOOL* stop) {
                          NSString *newString = nil;
                          if (substring.length == 1)    //The diacriticals are all Unicode BMP.
                          {
                              unichar uniChar = [substring characterAtIndex:0];
                              unichar newUniChar = [sDiacriticalSubstDict[@(uniChar)] integerValue];
                              if (newUniChar != 0)
                              {
                                  NSLog(@"Unichar %04x => %04x", uniChar, newUniChar);
                                  newString = [NSString stringWithCharacters:&newUniChar length:1];
                              }
                          }
                          if (newString)
                              [buffer appendString:newString];
                          else
                              [buffer appendString:substring];
                      }];

    NSString *precomposedStr = [buffer precomposedStringWithCanonicalMapping];
    return precomposedStr;
}

有人知道这种转换的更多内置方式吗?

Does anyone know of more built-in way to make this conversion?

推荐答案

由于间距修饰符字母"块(U+02B0 .. U+02FF)中的字符并非旨在进行此转换,因此没有内置方法用作变音标记.根据Unicode标准的7.8节:

There is no built-in way to do this conversion because characters in the Spacing Modifier Letters block (U+02B0..U+02FF) are not intended to be used as diacritical marks. From Section 7.8 of the Unicode Standard:

它们不是正式组合标记(gc = Mn或gc = Mc),并且不与它们修改的基字母进行图形组合.它们本身就是基本字符.

They are not formally combining marks (gc=Mn or gc=Mc) and do not graphically combine with the base letter that they modify. They are base characters in their own right.

变音符号的间隔克隆.一些公司标准明确规定了组合变音符号的间距和非间距形式,而Unicode标准在可行时为这些解释提供了匹配的代码.

Spacing Clones of Diacritics. Some corporate standards explicitly specify spacing and nonspacing forms of combining diacritical marks, and the Unicode Standard provides matching codes for these interpretations when practical.

如果要将它们转换为合并形式,则需要根据

If you want to convert them to the combining forms, you will need to build a table (as you are already doing) from the cross references in the Spacing Modifier Letters code chart.

这篇关于如何转换为“组合变音标记"?在iOS上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆