从拉丁字符中删除重音符号(变音符号)以进行比较 [英] Removing accent marks (diacritics) from Latin characters for comparison

查看:164
本文介绍了从拉丁字符中删除重音符号(变音符号)以进行比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将使用拉丁字母书写的欧洲地方的名称与某些字符上的重音符号(变音符号)进行比较。有许多中欧和东欧名称在žü上写有重音符号,如拉丁字符,但是有些人只是使用普通的拉丁字符来编写名称而没有重音符号,如 z u

I need to compare the names of European places that are written using the Latin alphabet with accent marks (diacritics) on some characters. There are lots of Central and Eastern European names that are written with accent marks like Latin characters on ž and ü, but some people write the names just using the regular Latin characters without accent marks like z and u.

我需要一种让我的系统识别的方法,例如mškžilina msk zilina相同,类似于所有其他重音字符。有一个简单的方法吗?

I need a way to have my system recognize for example mšk žilina being the same as msk zilina, and similar for all the other accented characters used. Is there a simple way to do this?

推荐答案

您可以使用 java.text.Normalizer 小正则表达式摆脱变音标记

public static String removeDiacriticalMarks(String string) {
    return Normalizer.normalize(string, Form.NFD)
        .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}

用法示例:

String text = "mšk žilina";
String normalized = removeDiacriticalMarks(text);
System.out.println(normalized); // msk zilina

这篇关于从拉丁字符中删除重音符号(变音符号)以进行比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆