如何处理将变音符号与UnicodeUtils结合使用? [英] How to handle Combining Diacritical Marks with UnicodeUtils?

查看：137 发布时间：2020/7/12 18:48:47 ruby unicode diacritics unicode-normalization phonetics

本文介绍了如何处理将变音符号与UnicodeUtils结合使用?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在IPA字符字符串中插入空格，例如将ɔ̃wɔ̃tɨ转换为ɔ̃ w ɔ̃ t ɨ.使用拆分/连接是我的第一个想法:

I am trying to insert spaces into a string of IPA characters, e.g. to turn ɔ̃wɔ̃tɨ into ɔ̃ w ɔ̃ t ɨ. Using split/join was my first thought:

s = ɔ̃w̃ɔtɨ
s.split('').join(' ') #=> ̃ ɔ w ̃ ɔ p t ɨ

正如我通过检查结果发现的那样，带有变音符号的字母实际上被编码为两个字符.经过一番研究，我找到了UnicodeUtils模块，并使用了each_grapheme方法:

As I discovered by examining the results, letters with diacritics are in fact encoded as two characters. After some research I found the UnicodeUtils module, and used the each_grapheme method:

UnicodeUtils.each_grapheme(s) {|g| g + ' '} #=> ɔ ̃w ̃ɔ p t ɨ

这很好，除了倒置的短边标记.代码将̑a更改为̑ a.我尝试了归一化(UnicodeUtils.nfc，UnicodeUtils.nfd)，但无济于事.我不知道为什么each_grapheme方法在此特殊变音符号上存在问题，但我注意到在gedit中，breve也被视为一个单独的字符，而不是波浪号，重音符号等.所以我的问题是如下:是否有一种简单的标准化方法，即将Latin Small Letter A和Combining Inverted Breve的组合转换为Latin Small Letter A With Inverted Breve?

This worked fine, except for the inverted breve mark. The code changes ̑a into ̑ a. I tried normalization (UnicodeUtils.nfc, UnicodeUtils.nfd), but to no avail. I don't know why the each_grapheme method has a problem with this particular diacritic mark, but I noticed that in gedit, the breve is also treated as a separate character, as opposed to tildes, accents etc. So my question is as follows: is there a straightforward method of normalization, i.e. turning the combination of Latin Small Letter A and Combining Inverted Breve into Latin Small Letter A With Inverted Breve?

推荐答案

我了解您的问题与Ruby有关，但我认为问题与Python差不多.一个简单的解决方案是显式测试组合的变音符号:

I understand your question concerns Ruby but I suppose the problem is about the same as with Python. A simple solution is to test the combining diacritical marks explicitly :

import unicodedata
liste=[]
s = u"ɔ̃w̃ɔtɨ"
comb=False
prec=u""
for char in s:
    if unicodedata.combining(char):
        liste.append(prec+char)
        prec=""
    else:
        liste.append(prec)
        prec=char
liste.append(prec)
print " ".join(liste)
>>>>  ɔ̃  w̃  ɔ t ɨ

这篇关于如何处理将变音符号与UnicodeUtils结合使用?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何处理将变音符号与UnicodeUtils结合使用? [英] How to handle Combining Diacritical Marks with UnicodeUtils?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何处理将变音符号与UnicodeUtils结合使用? [英] How to handle Combining Diacritical Marks with UnicodeUtils?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭