替换字符串中的特殊字符 [英] Replacing special characters from a string

查看:110
本文介绍了替换字符串中的特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

只是想知道是否有更优雅和可维护的方法:

Just would like to know if there is a more elegant and maintainable approach for this:

private String replaceSpecialChars(String fileName) {
    if (fileName.length() < 1) return null;

    if (fileName.contains("Ü")) {
        fileName = fileName.replace("Ü", "Ue");
    }

    if (fileName.contains("Ä")) {
        fileName = fileName.replace("Ä", "Ae");
    }

    if (fileName.contains("Ö")) {
        fileName = fileName.replace("Ö", "Oe");
    }

    if (fileName.contains("ü")) {
        fileName = fileName.replace("ü", "ue");
    }

    ...

    return fileName;
}

我只能使用 Java 6.

I'm restricted to Java 6.

推荐答案

在进一步讨论此问题之前,请注意您所做的实际上是不可能的.例如,瑞典语中Ö"的ascii-fication"是O"而不是Oe".没有办法知道一个词是瑞典语还是德语;毕竟,例如,瑞典人有时会搬到德国.如果你打开一本德国电话簿,看到一个 Sjögren 夫人,然后你把它化为 Sjoegren,那你就搞砸了.

Before you go any further on this, note that what you're doing is effectively impossible. For example, the 'ascii-fication' of 'Ö' in swedish is 'O' and not 'Oe'. There is no way to know if a word is swedish or german; after all, swedes sometimes move to germany, for example. If you open a german phonebook and you see a Mrs. Sjögren, and you asciify that to Sjoegren, you messed it up.

如果您想运行大小写不敏感比较",那么首先您必须回答几个问题.穆勒等于穆勒等于穆勒吗?那个兔子洞很深.

If you want to run 'case and asciification insensitive comparisons', well, first you have to answer a few questions. Is muller equal to mueller equal to müller? That rabbit hole goes quite deep.

一般的解决方案是三元组或 postgres 提供的其他通用文本搜索工具.或者,选择退出此机制并将这些内容存储在 unicode 中,并且要清楚,要找到 Sjögren 女士,您将需要搜索Sjögren".出于与寻找约翰逊先生相同的原因,如果您试图寻找詹森,您将无法找到.

The general solution is trigrams or other generalized text search tools such as provided by postgres. Alternatively, opt out of this mechanism and store this stuff in unicode, and be clear that to find Ms. Sjögren, you're going to have search for "Sjögren" for the same reason that to find Mr. Johnson, you're not going to if you try to search for Jahnson.

请注意,大多数文件系统都允许使用 unicode 文件名;无需尝试替换 Ü.

Note that most filesystems allow unicode filenames; there is no need to try to replace a Ü.

这也在某种程度上解释了为什么没有现成的库可用于这项看似普通的工作;事实上,这项工作是不可能的.

This also goes some way as to explain why there are no ready libraries available for this seemingly common job; the job is, in fact, impossible.

如果需要,您可以使用带有替换的 Map 来简化此代码.由于上述原因,我建议不要这样做.或者,只是......保持原样,但放弃包含.这段代码不必要地缓慢而冗长.

You can simplify this code by using a Map<String, String> with replacements if you must. I advise against it for the above reasons. Or, just.. keep it as is, but ditch the contains. This code is needlessly slow and lengthy.

两者没有区别:

if (fileName.contains("x")) fileName = fileName.replace("x", "y");

并且只是 fileName = fileName.replace("x", "y"); 除了前者更慢(替换不会生成新字符串并返回自身,如果你要求它替换一个它不包含的字符串.前者会搜索两次,后者只会搜索一次,除非需要进行实际的字符串替换,否则任何一个都不会产生新的字符串.

and just fileName = fileName.replace("x", "y"); except that the former is strictly slower (replace does not make a new string and returns itself, if you ask it to replace a string that it does not contain. The former will search twice, the latter only once, and either one will make no new strings unless actual string replacing needs to be done.

然后你可以链接它:

if (fileName.isEmpty()) return null;
return fileName
    .replace("Ü", "Ue")
    .replace("Ä", "Ae")
    ...
    ;

但是,正如我所说,您可能不想那样做,除非您希望在未来某个时候有一个恼怒的人抱怨您将他们的姓氏化为乌有.

But, as I said, you probably don't want to do that, unless you want an aggravated person on the line at some point in the future complaining that you bungled up the asciification of their surname.

这篇关于替换字符串中的特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆