特别caracters比较句话的时候忽略变音符字符(E,E,...) [英] Ignoring diacritic characters when comparing words with special caracters (é,è,...)

查看:246
本文介绍了特别caracters比较句话的时候忽略变音符字符(E,E,...)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些比利时城市读音符号字符的列表:(列日,Quiévrain,Franière等),我想将这些特殊字符包含大写同一名称的列表进行比较,但没有变音符号(列日,QUIEVRAIN,FRANIERE)

I have a list with some Belgian cities with diacritic characters: (Liège, Quiévrain, Franière, etc.) and I would like to transform these special characters to compare with a list containing the same names in upper case, but without the diacritical marks (LIEGE, QUIEVRAIN, FRANIERE)

是我第一次尝试做的是使用大写:

What i first tried to do was to use the upper case:

LIEGE.contentEqual(Liège.toUpperCase())但是,这并不适合,因为列日是LIÈGE,而不是列日

LIEGE.contentEqual(Liège.toUpperCase()) but that doesn't fit because the Upper case of Liège is LIÈGE and not LIEGE.

我有一些复杂的想法就像替换每个字符,但听起来愚蠢和一个长期的过程。

I have some complicated ideas like replacing each characters, but that sound stupid and a long process.

这是如何做到这一点的一个巧妙的方法你知道吗?

Any idea on how to do that in a smart way?

推荐答案

看看这个方法的Java

Check out this method in Java

private static final String PLAIN_ASCII = "AaEeIiOoUu" // grave
            + "AaEeIiOoUuYy" // acute
            + "AaEeIiOoUuYy" // circumflex
            + "AaOoNn" // tilde
            + "AaEeIiOoUuYy" // umlaut
            + "Aa" // ring
            + "Cc" // cedilla
            + "OoUu" // double acute
    ;

    private static final String UNICODE = "\u00C0\u00E0\u00C8\u00E8\u00CC\u00EC\u00D2\u00F2\u00D9\u00F9"
            + "\u00C1\u00E1\u00C9\u00E9\u00CD\u00ED\u00D3\u00F3\u00DA\u00FA\u00DD\u00FD"
            + "\u00C2\u00E2\u00CA\u00EA\u00CE\u00EE\u00D4\u00F4\u00DB\u00FB\u0176\u0177"
            + "\u00C3\u00E3\u00D5\u00F5\u00D1\u00F1"
            + "\u00C4\u00E4\u00CB\u00EB\u00CF\u00EF\u00D6\u00F6\u00DC\u00FC\u0178\u00FF"
            + "\u00C5\u00E5" + "\u00C7\u00E7" + "\u0150\u0151\u0170\u0171";

    /**
     * remove accented from a string and replace with ascii equivalent
     */
    public static String removeAccents(String s) {
        if (s == null)
            return null;
        StringBuilder sb = new StringBuilder(s.length());
        int n = s.length();
        int pos = -1;
        char c;
        boolean found = false;
        for (int i = 0; i < n; i++) {
            pos = -1;
            c = s.charAt(i);
            pos = (c <= 126) ? -1 : UNICODE.indexOf(c);
            if (pos > -1) {
                found = true;
                sb.append(PLAIN_ASCII.charAt(pos));
            } else {
                sb.append(c);
            }
        }
        if (!found) {
            return s;
        } else {
            return sb.toString();
        }
    }

这篇关于特别caracters比较句话的时候忽略变音符字符(E,E,...)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆