替代连续 String.replace [英] Alternative to successive String.replace

查看:41
本文介绍了替代连续 String.replace的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想替换字符串输入中的一些字符串:

I want to replace some strings in a String input :

string=string.replace("<h1>","<big><big><big><b>");
string=string.replace("</h1>","</b></big></big></big>");
string=string.replace("<h2>","<big><big>");
string=string.replace("</h2>","</big></big>");
string=string.replace("<h3>","<big>");
string=string.replace("</h3>","</big>");
string=string.replace("<h4>","<b>");
string=string.replace("</h4>","</b>");
string=string.replace("<h5>","<small><b>");
string=string.replace("</h5>","</b><small>");
string=string.replace("<h6>","<small>");
string=string.replace("</h6>","</small>");

如您所见,这种方法并不是最好的,因为每次我都必须搜索要替换的部分等,而字符串是不可变的...而且输入很大,这意味着一些性能问题考虑.

As you can see this approach is not the best, because each time I have to search for the portion to replace etc, and Strings are immutable... Also the input is large, which means that some performance issues are to be considered.

有没有更好的方法来降低这段代码的复杂性?

Is there any better approach to reduce the complexity of this code ?

推荐答案

虽然 StringBuilder.replace()String.replace(),离最佳状态还有很远.

StringBuilder.replace() 的问题在于,如果替换的长度与可替换部分的长度不同(适用于我们的情况),则更大的内部 char 数组可能必须分配,必须复制内容,然后会发生替换(这也涉及复制).

The problem with StringBuilder.replace() is that if the replacement has different length than the replaceable part (applies to our case), a bigger internal char array might have to be allocated, and the content has to be copied, and then the replace will occur (which also involves copying).

想象一下:您有一个包含 10.000 个字符的文本.如果要将在位置 1(第二个字符)找到的 "XY" 子字符串替换为 "ABC",则实现必须重新分配一个char 缓冲区至少大 1,必须将旧内容复制到新数组,并且必须将 9.997 个字符(从位置 3 开始)复制到右移1使"ABC"适合"XY"的位置,最后将"ABC"的字符复制到起始位置1.每次更换都必须这样做!这很慢.

Imagine this: You have a text with 10.000 characters. If you want to replace the "XY" substring found at position 1 (2nd character) to "ABC", the implementation has to reallocate a char buffer which is at least larger by 1, has to copy the old content to the new array, and it has to copy 9.997 characters (starting at position 3) to the right by 1 to fit "ABC" into the place of "XY", and finally characters of "ABC" are copied to the starter position 1. This has to be done for every replace! This is slow.

我们可以on-the-fly构建输出:不包含可替换文本的部分可以简单地附加到输出中,如果我们找到可替换的片段,我们会附加替换其中.从理论上讲,只需循环输入一次即可生成输出.听起来很简单,实现起来并不难.

We can build the output on-the-fly: parts that don't contain replaceable texts can simply be appended to the output, and if we find a replaceable fragment, we append the replacement instead of it. Theoretically it's enough to loop over the input only once to generate the output. Sounds simple, and it's not that hard to implement it.

实施:

我们将使用预加载了可替换替换字符串映射的 Map:

We will use a Map preloaded with mappings of the replaceable-replacement strings:

Map<String, String> map = new HashMap<>();
map.put("<h1>", "<big><big><big><b>");
map.put("</h1>", "</b></big></big></big>");
map.put("<h2>", "<big><big>");
map.put("</h2>", "</big></big>");
map.put("<h3>", "<big>");
map.put("</h3>", "</big>");
map.put("<h4>", "<b>");
map.put("</h4>", "</b>");
map.put("<h5>", "<small><b>");
map.put("</h5>", "</b></small>");
map.put("<h6>", "<small>");
map.put("</h6>", "</small>");

使用这个,这里是替换代码:(代码后有更多解释)

public static String replaceTags(String src, Map<String, String> map) {
    StringBuilder sb = new StringBuilder(src.length() + src.length() / 2);

    for (int pos = 0;;) {
        int ltIdx = src.indexOf('<', pos);
        if (ltIdx < 0) {
            // No more '<', we're done:
            sb.append(src, pos, src.length());
            return sb.toString();
        }

        sb.append(src, pos, ltIdx); // Copy chars before '<'
        // Check if our hit is replaceable:
        boolean mismatch = true;
        for (Entry<String, String> e : map.entrySet()) {
            String key = e.getKey();
            if (src.regionMatches(ltIdx, key, 0, key.length())) {
                // Match, append the replacement:
                sb.append(e.getValue());
                pos = ltIdx + key.length();
                mismatch = false;
                break;
            }
        }
        if (mismatch) {
            sb.append('<');
            pos = ltIdx + 1;
        }
    }
}

测试:

String in = "Yo<h1>TITLE</h1><h3>Hi!</h3>Nice day.<h6>Hi back!</h6>End";
System.out.println(in);
System.out.println(replaceTags(in, map));

输出:(包裹起来避免滚动条)

Output: (wrapped to avoid scroll bar)

Yo<h1>TITLE</h1><h3>Hi!</h3>Nice day.<h6>Hi back!</h6>End

Yo<big><big><big><b>TITLE</b></big></big></big><big>Hi!</big>Nice day.
<small>Hi back!</small>End

这个解决方案比使用正则表达式更快,因为它涉及很多开销,比如编译一个 Pattern,创建一个 Matcher 等,并且 regexp 也更通用.它还在引擎盖下创建了许多临时物品,这些物品在更换后会被扔掉.在这里,我只使用了一个 StringBuilder(加上它的引擎盖下的 char 数组)并且代码只在输入 String 上迭代一次.此外,此解决方案比使用本答案顶部详述的 StringBuilder.replace() 快得多.

This solution is faster than using regular expressions as that involves much overhead, like compiling a Pattern, creating a Matcher etc. and regexp is also much more general. It also creates many temporary objects under the hood which are thrown away after the replace. Here I only use a StringBuilder (plus char array under its hood) and the code iterates over the input String only once. Also this solution is much faster that using StringBuilder.replace() as detailed at the top of this answer.

我像这样在 replaceTags() 方法中初始化了 StringBuilder:

I initialized the StringBuilder in the replaceTags() method like this:

StringBuilder sb = new StringBuilder(src.length() + src.length() / 2);

所以基本上我用原始String 长度的150% 的初始容量创建了它.这是因为我们的替换比可替换的文本长,所以如果发生替换,输出显然会比输入长.为 StringBuilder 提供更大的初始容量将根本不会导致内部 char[] 重新分配(当然所需的初始容量取决于可替换-替换对及其频率/出现在输入中,但这个 +50% 是一个很好的上限).

So basically I created it with an initial capacity of 150% of the length of the original String. This is because our replacements are longer than the replaceable texts, so if replacing occurs, the output will obviously be longer than the input. Giving a larger initial capacity to StringBuilder will result in no internal char[] reallocation at all (of course the required initial capacity depends on the replaceable-replacement pairs and their frequency/occurrence in the input, but this +50% is a good upper estimation).

我还利用了所有可替换字符串都以 '<' 字符开头的事实,因此找到下一个潜在的可替换位置变得非常快:

I also utilized the fact that all replaceable strings start with a '<' character, so finding the next potential replaceable position becomes blazing-fast:

int ltIdx = src.indexOf('<', pos);

这只是一个简单的循环和 String 内部的 char 比较,并且因为它总是从 pos 开始搜索(而不是从输入),整个代码只对输入 String 迭代一次.

It's just a simple loop and char comparisons inside String, and since it always starts searching from pos (and not from the start of the input), overall the code iterates over the input String only once.

最后要判断一个可替换的 String 是否确实出现在潜在位置,我们使用 String.regionMatches() 方法来检查可替换的字符串也非常快,因为它只是在循环中比较 char 值并在第一个不匹配的字符处返回.

And finally to tell if a replaceable String does occur at the potential position, we use the String.regionMatches() method to check the replaceable stings which is also blazing-fast as all it does is just compares char values in a loop and returns at the very first mismatching character.

还有一个加号:

问题没有提到它,但我们的输入是一个 HTML 文档.HTML 标签不区分大小写,这意味着输入可能包含

而不是

.
对于这个算法,这不是问题.String 类中的 regionMatches() 有一个重载 支持不区分大小写的比较:

The question doesn't mention it, but our input is an HTML document. HTML tags are case-insensitive which means the input might contain <H1> instead of <h1>.
To this algorithm this is not a problem. The regionMatches() in the String class has an overload which supports case-insensitive comparison:

boolean regionMatches(boolean ignoreCase, int toffset, String other,
                          int ooffset, int len);

因此,如果我们想修改我们的算法以查找和替换相同但使用不同字母大小写的输入标签,我们只需要修改这一行:

So if we want to modify our algorithm to also find and replace input tags which are the same but are written using different letter case, all we have to modify is this one line:

if (src.regionMatches(true, ltIdx, key, 0, key.length())) {

使用此修改后的代码,可替换标签变得不区分大小写:

Using this modified code, replaceable tags become case-insensitive:

Yo<H1>TITLE</H1><h3>Hi!</h3>Nice day.<H6>Hi back!</H6>End
Yo<big><big><big><b>TITLE</b></big></big></big><big>Hi!</big>Nice day.
<small>Hi back!</small>End

这篇关于替代连续 String.replace的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆