如何检查是否Unicode字符有变音符号在.net中? [英] How to check if Unicode character has diacritics in .Net?
问题描述
我开发自动语言检测启发式,想找出给定的信中是否有变音符号(如ÐàäèîÊóëüòóðà - 所有的字母都变音符号)。这将是最好的,如果我也能得到区分符号的类型,如果可能的话。
I am developing a heuristic for automatic language detection and would like to find out whether the given letter has diacritics (like "Ðàäèî Êóëüòóðà" -- all letters have diacritics). It would be best if I could also get the type of diacritic, if possible.
我通过 UnicodeCategory浏览
枚举但没有找到任何可以帮助我在这里。
I browsed through UnicodeCategory
enum but didn't find anything that could help me here.
推荐答案
在可能的方式是将其归为一种形式,字母和他们变音符号写几码点。然后检查,如果你有一个字母后跟口音。
On possible way is to normalize it to a form where letters and their diacritics are written as several codepoints. And then check if you have a letter followed by accents.
从的如何在.NET中的字符串中删除变音符号(重音)?你可以用正常化( NormalizationForm.FormD)
并检查与 UnicodeCategory.NonSpacingMark
变音符号。
Adapting from How do I remove diacritics (accents) from a string in .NET? you can normalize with Normalize(NormalizationForm.FormD)
and check for the diacritics with UnicodeCategory.NonSpacingMark
.
bool IsLetterWithDiacritics(char c)
{
var s=c.ToString().Normalize(NormalizationForm.FormD);
return (s.Length>1) &&
char.IsLetter(s[0]) &&
s.Skip(1).All(c2=>CharUnicodeInfo.GetUnicodeCategory(c2)==UnicodeCategory.NonSpacingMark);
}
这篇关于如何检查是否Unicode字符有变音符号在.net中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!