如何检查是否Unicode字符有变音符号在.net中? [英] How to check if Unicode character has diacritics in .Net?

查看:243
本文介绍了如何检查是否Unicode字符有变音符号在.net中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开发自动语言检测启发式,想找出给定的信中是否有变音符号(如ÐàäèîÊóëüòóðà - 所有的字母都变音符号)。这将是最好的,如果我也能得到区分符号的类型,如果可能的话。

I am developing a heuristic for automatic language detection and would like to find out whether the given letter has diacritics (like "Ðàäèî Êóëüòóðà" -- all letters have diacritics). It would be best if I could also get the type of diacritic, if possible.

我通过 UnicodeCategory浏览枚举但没有找到任何可以帮助我在这里。

I browsed through UnicodeCategory enum but didn't find anything that could help me here.

推荐答案

在可能的方式是将其归为一种形式,字母和他们变音符号写几码点。然后检查,如果你有一个字母后跟口音。

On possible way is to normalize it to a form where letters and their diacritics are written as several codepoints. And then check if you have a letter followed by accents.

从的如何在.NET中的字符串中删除变音符号(重音)?你可以用正常化( NormalizationForm.FormD)并检查与 UnicodeCategory.NonSpacingMark 变音符号。

Adapting from How do I remove diacritics (accents) from a string in .NET? you can normalize with Normalize(NormalizationForm.FormD) and check for the diacritics with UnicodeCategory.NonSpacingMark.

bool IsLetterWithDiacritics(char c)
{
    var s=c.ToString().Normalize(NormalizationForm.FormD);
    return (s.Length>1)  &&
           char.IsLetter(s[0]) &&
           s.Skip(1).All(c2=>CharUnicodeInfo.GetUnicodeCategory(c2)==UnicodeCategory.NonSpacingMark);
}

这篇关于如何检查是否Unicode字符有变音符号在.net中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆