如何从 .NET 中的字符串中删除变音符号(重音符号)? [英] How do I remove diacritics (accents) from a string in .NET?

查看:37
本文介绍了如何从 .NET 中的字符串中删除变音符号(重音符号)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试转换一些加拿大法语字符串,基本上,我希望能够在保留字母的同时去除字母中的法语重音符号.(例如,将 é 转换为 e,这样 crème brûlée 就会变成 creme brulee)

I'm trying to convert some strings that are in French Canadian and basically, I'd like to be able to take out the French accent marks in the letters while keeping the letter. (E.g. convert é to e, so crème brûlée would become creme brulee)

实现这一目标的最佳方法是什么?

What is the best method for achieving this?

推荐答案

我没有使用过这种方法,但 Michael Kaplan 在他的博客文章(标题令人困惑)中描述了一种这样做的方法,该文章谈到了去除变音符号:剥离是一项有趣的工作(又名关于无意义的意义,又名所有Mn 字符是无间距的,但有些比无间距更其他)

I've not used this method, but Michael Kaplan describes a method for doing so in his blog post (with a confusing title) that talks about stripping diacritics: Stripping is an interesting job (aka On the meaning of meaningless, aka All Mn characters are non-spacing, but some are more non-spacing than others)

static string RemoveDiacritics(string text) 
{
    var normalizedString = text.Normalize(NormalizationForm.FormD);
    var stringBuilder = new StringBuilder();

    foreach (var c in normalizedString)
    {
        var unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
        if (unicodeCategory != UnicodeCategory.NonSpacingMark)
        {
            stringBuilder.Append(c);
        }
    }

    return stringBuilder.ToString().Normalize(NormalizationForm.FormC);
}

请注意,这是他之前帖子的后续:剥离变音符号....

Note that this is a followup to his earlier post: Stripping diacritics....

该方法使用 String.Normalize 来拆分将输入字符串转换为组成字形(基本上将基本"字符与变音符号分开),然后扫描结果并仅保留基本字符.只是有点复杂,但实际上您正在研究一个复杂的问题.

The approach uses String.Normalize to split the input string into constituent glyphs (basically separating the "base" characters from the diacritics) and then scans the result and retains only the base characters. It's just a little complicated, but really you're looking at a complicated problem.

当然,如果您仅限于法语,您可能可以使用 如何删除 C++ std::string 中的重音和波浪线,如@David Dibben 所推荐.

Of course, if you're limiting yourself to French, you could probably get away with the simple table-based approach in How to remove accents and tilde in a C++ std::string, as recommended by @David Dibben.

这篇关于如何从 .NET 中的字符串中删除变音符号(重音符号)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆