替代连续String.replace [英] Alternative to successive String.replace

查看:60
本文介绍了替代连续String.replace的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想替换String输入中的一些字符串:

I want to replace some strings in a String input :

string=string.replace("<h1>","<big><big><big><b>");
string=string.replace("</h1>","</b></big></big></big>");
string=string.replace("<h2>","<big><big>");
string=string.replace("</h2>","</big></big>");
string=string.replace("<h3>","<big>");
string=string.replace("</h3>","</big>");
string=string.replace("<h4>","<b>");
string=string.replace("</h4>","</b>");
string=string.replace("<h5>","<small><b>");
string=string.replace("</h5>","</b><small>");
string=string.replace("<h6>","<small>");
string=string.replace("</h6>","</small>");

如您所见,这种方法不是最佳方法,因为每次我必须搜索要替换的部分等时,字符串都是不可变的...输入也很大,这意味着要解决一些性能问题.考虑过的.

As you can see this approach is not the best, because each time I have to search for the portion to replace etc, and Strings are immutable... Also the input is large, which means that some performance issues are to be considered.

是否有更好的方法来减少此代码的复杂性?

Is there any better approach to reduce the complexity of this code ?

推荐答案

尽管

Although StringBuilder.replace() is a huge improvement compared to String.replace(), it is still very far from being optimal.

StringBuilder.replace()的问题是,如果替换的长度与可替换部分的长度不同(适用于我们的情况),则可能会使用更大的内部 char 数组必须分配,并且必须复制内容,然后替换才会发生(这也涉及复制).

The problem with StringBuilder.replace() is that if the replacement has different length than the replaceable part (applies to our case), a bigger internal char array might have to be allocated, and the content has to be copied, and then the replace will occur (which also involves copying).

想象一下:您有一个包含10.000个字符的文本.如果要将在位置 1 (第二个字符)处找到的"XY" 子字符串替换为"ABC" ,则该实现必须重新分配一个至少大于1的 char 缓冲区必须将旧内容复制到新数组,并且必须复制9.997个字符(从位置 3 开始).右移1以将"ABC" 放入"XY" 的位置,最后将"ABC" 的字符复制到起始位置 1 .每次更换都必须这样做!这很慢.

Imagine this: You have a text with 10.000 characters. If you want to replace the "XY" substring found at position 1 (2nd character) to "ABC", the implementation has to reallocate a char buffer which is at least larger by 1, has to copy the old content to the new array, and it has to copy 9.997 characters (starting at position 3) to the right by 1 to fit "ABC" into the place of "XY", and finally characters of "ABC" are copied to the starter position 1. This has to be done for every replace! This is slow.

我们可以即时生成输出 :不包含可替换文本的部分可以简单地附加到输出中,如果我们找到可替换的片段,则可以添加替换内容它的.从理论上讲,只需循环一次 即可生成输出.听起来很简单,实现起来并不难.

We can build the output on-the-fly: parts that don't contain replaceable texts can simply be appended to the output, and if we find a replaceable fragment, we append the replacement instead of it. Theoretically it's enough to loop over the input only once to generate the output. Sounds simple, and it's not that hard to implement it.

实施方式:

我们将使用预加载了可替换替换字符串映射的 Map :

We will use a Map preloaded with mappings of the replaceable-replacement strings:

Map<String, String> map = new HashMap<>();
map.put("<h1>", "<big><big><big><b>");
map.put("</h1>", "</b></big></big></big>");
map.put("<h2>", "<big><big>");
map.put("</h2>", "</big></big>");
map.put("<h3>", "<big>");
map.put("</h3>", "</big>");
map.put("<h4>", "<b>");
map.put("</h4>", "</b>");
map.put("<h5>", "<small><b>");
map.put("</h5>", "</b></small>");
map.put("<h6>", "<small>");
map.put("</h6>", "</small>");

并使用它,这里是替换代码:(代码后有更多说明)

And using this, here is the replacer code: (more explanation after the code)

public static String replaceTags(String src, Map<String, String> map) {
    StringBuilder sb = new StringBuilder(src.length() + src.length() / 2);

    for (int pos = 0;;) {
        int ltIdx = src.indexOf('<', pos);
        if (ltIdx < 0) {
            // No more '<', we're done:
            sb.append(src, pos, src.length());
            return sb.toString();
        }

        sb.append(src, pos, ltIdx); // Copy chars before '<'
        // Check if our hit is replaceable:
        boolean mismatch = true;
        for (Entry<String, String> e : map.entrySet()) {
            String key = e.getKey();
            if (src.regionMatches(ltIdx, key, 0, key.length())) {
                // Match, append the replacement:
                sb.append(e.getValue());
                pos = ltIdx + key.length();
                mismatch = false;
                break;
            }
        }
        if (mismatch) {
            sb.append('<');
            pos = ltIdx + 1;
        }
    }
}

对其进行测试:

String in = "Yo<h1>TITLE</h1><h3>Hi!</h3>Nice day.<h6>Hi back!</h6>End";
System.out.println(in);
System.out.println(replaceTags(in, map));

输出:(包装以避免滚动条)

Output: (wrapped to avoid scroll bar)

Yo<h1>TITLE</h1><h3>Hi!</h3>Nice day.<h6>Hi back!</h6>End

Yo<big><big><big><b>TITLE</b></big></big></big><big>Hi!</big>Nice day.
<small>Hi back!</small>End

此解决方案比使用正则表达式更快,因为它涉及很多开销,例如编译 Pattern ,创建 Matcher 等,而regexp也更为通用.它还会在引擎盖下创建许多临时对象,这些对象在替换后会被丢弃.在这里,我只使用一个 StringBuilder (加上它的 char 数组),并且该代码仅对输入的 String 进行一次迭代.同样,此解决方案比使用 StringBuilder.replace()(此答案顶部详细介绍)要快得多.

This solution is faster than using regular expressions as that involves much overhead, like compiling a Pattern, creating a Matcher etc. and regexp is also much more general. It also creates many temporary objects under the hood which are thrown away after the replace. Here I only use a StringBuilder (plus char array under its hood) and the code iterates over the input String only once. Also this solution is much faster that using StringBuilder.replace() as detailed at the top of this answer.

我在 replaceTags()方法中初始化了 StringBuilder ,如下所示:

I initialized the StringBuilder in the replaceTags() method like this:

StringBuilder sb = new StringBuilder(src.length() + src.length() / 2);

因此,基本上,我以最初的 String 长度的150%的初始容量创建了它.这是因为我们的替换项比可替换的文本长,因此,如果发生替换,则输出将明显长于输入项.赋予 StringBuilder 更大的初始容量将根本不进行内部 char [] 的重新分配(当然,所需的初始容量取决于可替换替换对及其频率/发生在输入中,但+ 50%是一个很好的上限估算值.

So basically I created it with an initial capacity of 150% of the length of the original String. This is because our replacements are longer than the replaceable texts, so if replacing occurs, the output will obviously be longer than the input. Giving a larger initial capacity to StringBuilder will result in no internal char[] reallocation at all (of course the required initial capacity depends on the replaceable-replacement pairs and their frequency/occurrence in the input, but this +50% is a good upper estimation).

我还利用了一个事实,即所有可替换的字符串都以'<'字符开头,因此找到下一个潜在的可替换位置变得非常快:

I also utilized the fact that all replaceable strings start with a '<' character, so finding the next potential replaceable position becomes blazing-fast:

int ltIdx = src.indexOf('<', pos);

这只是一个简单的循环,并且在 String 内部进行了 char 比较,并且由于它总是从 pos 开始搜索(而不是从输入),总体而言,代码仅对输入的 String 进行一次迭代.

It's just a simple loop and char comparisons inside String, and since it always starts searching from pos (and not from the start of the input), overall the code iterates over the input String only once.

最后要确定在潜在位置是否确实出现了可替换的 String ,我们使用

And finally to tell if a replaceable String does occur at the potential position, we use the String.regionMatches() method to check the replaceable stings which is also blazing-fast as all it does is just compares char values in a loop and returns at the very first mismatching character.

还有一个加号:

这个问题没有提及,但是我们的输入是一个HTML文档.HTML标记不区分大小写,这意味着输入内容可能包含< H1> 而不是< h1> .
对于此算法,这不是问题. String 类中的 regionMatches()具有一个重载,该重载

The question doesn't mention it, but our input is an HTML document. HTML tags are case-insensitive which means the input might contain <H1> instead of <h1>.
To this algorithm this is not a problem. The regionMatches() in the String class has an overload which supports case-insensitive comparison:

boolean regionMatches(boolean ignoreCase, int toffset, String other,
                          int ooffset, int len);

因此,如果要修改算法以查找和替换相同但使用不同字母大小写的输入标签,则只需修改以下一行:

So if we want to modify our algorithm to also find and replace input tags which are the same but are written using different letter case, all we have to modify is this one line:

if (src.regionMatches(true, ltIdx, key, 0, key.length())) {

使用此修改后的代码,可替换标签变得不区分大小写:

Using this modified code, replaceable tags become case-insensitive:

Yo<H1>TITLE</H1><h3>Hi!</h3>Nice day.<H6>Hi back!</H6>End
Yo<big><big><big><b>TITLE</b></big></big></big><big>Hi!</big>Nice day.
<small>Hi back!</small>End

这篇关于替代连续String.replace的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆