如何删除口音的字符串? [英] How can I remove accents on a string?

查看:134
本文介绍了如何删除口音的字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可能重复:
  <一href="http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net">How做我删除从字符串变音符号(重音)的.NET?

我有以下字符串

áéíóú

,我需要将其转换为

which I need to convert it to

aeiou

我怎样才能实现呢? (我不需要比较,我需要新的字符串保存)

How can I achieve it? (I don't need to compare, I need the new string to save)

不重复的<一个href="http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net">How我删除从字符串变音符号(重音)的.NET?。接受的答案,没有解释什么,这就是为什么我已经重开了。

Not a duplicate of How do I remove diacritics (accents) from a string in .NET?. The accepted answer there doesn't explain anything and that's why I've "reopened" it.

推荐答案

这取决于需求。对于大多数应用,然后归到NFD,然后过滤掉所有组合字符就行了。对于某些情况下,正火,以NFKD更合适(如果你也想删除的字符之间有一些进一步的区分)。

It depends on requirements. For most uses, then normalising to NFD and then filtering out all combining chars will do. For some cases, normalising to NFKD is more appropriate (if you also want to removed some further distinctions between characters).

一些其他方面的区别将不会被该抓,特别是抚摸拉丁字符。但也没有明显的非语言环境特定的方式对一些(应该L为视为等同为L或W?),所以你可能需要超出了定制。

Some other distinctions will not be caught by this, notably stroked Latin characters. There's also no clear non-locale-specific way for some (should ł be considered equivalent to l or w?) so you may need to customise beyond this.

也有一些情况下,NFD和NFKD没有按预期工作相当,允许统一code版本之间的一致性。

There are also some cases where NFD and NFKD don't work quite as expected, to allow for consistency between Unicode versions.

因此​​:

public static IEnumerable<char> RemoveDiacriticsEnum(string src, bool compatNorm, Func<char, char> customFolding)
{
    foreach(char c in src.Normalize(compatNorm ? NormalizationForm.FormKD : NormalizationForm.FormD))
    switch(CharUnicodeInfo.GetUnicodeCategory(c))
    {
      case UnicodeCategory.NonSpacingMark:
      case UnicodeCategory.SpacingCombiningMark:
      case UnicodeCategory.EnclosingMark:
        //do nothing
        break;
      default:
        yield return customFolding(c);
        break;
    }
}
public static IEnumerable<char> RemoveDiacriticsEnum(string src, bool compatNorm)
{
  return RemoveDiacritics(src, compatNorm, c => c);
}
public static string RemoveDiacritics(string src, bool compatNorm, Func<char, char> customFolding)
{
  StringBuilder sb = new StringBuilder();
  foreach(char c in RemoveDiacriticsEnum(src, compatNorm, customFolding))
    sb.Append(c);
  return sb.ToString();
}
public static string RemoveDiacritics(string src, bool compatNorm)
{
  return RemoveDiacritics(src, compatNorm, c => c);
}

在这里,我们已经默认了上述问题的情况下,这只是忽略它们。我们也拆建设中的字符串生成字符计数的,所以我们不必浪费的情况下没有必要对字符串操作的结果(比方说我们要写入字符输出下一个,或者做一些进一步的炭逐字符操作)。

Here we've a default for the problem cases mentioned above, which just ignores them. We've also split building a string from generating the enumeration of characters so we need not be wasteful in cases where there's no need for string manipulation on the result (say we were going to write the chars to output next, or do some further char-by-char manipulation).

这是例子情况下的东西,我们想也转换L和L为L和L,但没有其他专门的担忧可以使用:

An example case for something where we wanted to also convert ł and Ł to l and L, but had no other specialised concerns could use:

private static char NormaliseLWithStroke(char c)
{
  switch(c)
  {
     case 'ł':
       return 'l';
     case 'Ł':
       return 'L';
     default:
       return c;
  }
}

使用此用上述的方法将结合以除去行程在这种情况下,伴随着分解性附加符号

Using this with the above methods will combine to remove the stroke in this case, along with the decomposable diacritics.

这篇关于如何删除口音的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆