从Java中的字符串中有效删除特定字符(一些标点符号)? [英] Efficiently removing specific characters (some punctuation) from Strings in Java?

查看:86
本文介绍了从Java中的字符串中有效删除特定字符(一些标点符号)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Java中,从String中删除给定字符的最有效方法是什么?目前,我有这样的代码:

In Java, what is the most efficient way of removing given characters from a String? Currently, I have this code:

private static String processWord(String x) {
    String tmp;

    tmp = x.toLowerCase();
    tmp = tmp.replace(",", "");
    tmp = tmp.replace(".", "");
    tmp = tmp.replace(";", "");
    tmp = tmp.replace("!", "");
    tmp = tmp.replace("?", "");
    tmp = tmp.replace("(", "");
    tmp = tmp.replace(")", "");
    tmp = tmp.replace("{", "");
    tmp = tmp.replace("}", "");
    tmp = tmp.replace("[", "");
    tmp = tmp.replace("]", "");
    tmp = tmp.replace("<", "");
    tmp = tmp.replace(">", "");
    tmp = tmp.replace("%", "");

    return tmp;
}

如果我使用某种StringBuilder或正则表达式会更快吗?或者别的什么?是的,我知道:简介并查看,但我希望有人可以提供他们头脑的答案,因为这是一项常见任务。

Would it be faster if I used some sort of StringBuilder, or a regex, or maybe something else? Yes, I know: profile it and see, but I hope someone can provide an answer of the top of their head, as this is a common task.

推荐答案

这是一个迟到的答案,只是为了好玩。

Here's a late answer, just for fun.

在这种情况下,我建议瞄准速度的可读性。当然,你可以超级可读但速度太慢,就像这个超简洁版本一样:

In cases like this, I would suggest aiming for readability over speed. Of course you can be super-readable but too slow, as in this super-concise version:

private static String processWord(String x) {
    return x.replaceAll("[][(){},.;!?<>%]", "");
}

这很慢,因为每次调用此方法时,都会编译正则表达式。所以你可以预编译正则表达式。

This is slow because everytime you call this method, the regex will be compiled. So you can pre-compile the regex.

private static final Pattern UNDESIRABLES = Pattern.compile("[][(){},.;!?<>%]");

private static String processWord(String x) {
    return UNDESIRABLES.matcher(x).replaceAll("");
}

这应该足够快,大多数用途,假设JVM的正则表达式引擎优化了字符类查找。这是我个人会使用的解决方案。

This should be fast enough for most purposes, assuming the JVM's regex engine optimizes the character class lookup. This is the solution I would use, personally.

现在没有分析,我不知道你是否可以通过制作自己的角色(实际代码点)查找表做得更好:

Now without profiling, I wouldn't know whether you could do better by making your own character (actually codepoint) lookup table:

private static final boolean[] CHARS_TO_KEEP = new boolean[];

填写一次,然后迭代,生成结果字符串。我会把代码留给你。 :)

Fill this once and then iterate, making your resulting string. I'll leave the code to you. :)

同样,我不会深入研究这种优化。代码变得难以阅读。性能是一个令人担忧的问题吗?还要记住,现代语言是JITted,在热身后它们会表现得更好,所以请使用一个好的分析器。

Again, I wouldn't dive into this kind of optimization. The code has become too hard to read. Is performance that much of a concern? Also remember that modern languages are JITted and after warming up they will perform better, so use a good profiler.

应该提到的一件事是原始的例子问题是非常不具备性能的,因为你正在创建一大堆临时字符串!除非编译器优化所有这些,否则该特定解决方案将执行最差。

One thing that should be mentioned is that the example in the original question is highly non-performant because you are creating a whole bunch of temporary strings! Unless a compiler optimizes all that away, that particular solution will perform the worst.

这篇关于从Java中的字符串中有效删除特定字符(一些标点符号)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆