Java-检查字符串大小的最快方法 [英] Java - Fastest way to check the size of String

查看:278
本文介绍了Java-检查字符串大小的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在循环语句中有以下代码.
在循环中,将字符串附加到sb(StringBuilder)并检查sb的大小是否已达到5MB.

I have the following code inside a loop statement.
In the loop, strings are appended to sb(StringBuilder) and checked whether the size of sb has reached 5MB.

if (sb.toString().getBytes("UTF-8").length >= 5242880) {
    // Do something
}

这可以正常工作,但是速度很慢(就检查尺寸而言)
最快的方法是什么?

This works fine, but it is very slow(in terms of checking the size)
What would be the fastest way to do this?

推荐答案

您可以使用

快速计算UTF-8长度

public static int utf8Length(CharSequence cs) {
    return cs.codePoints()
        .map(cp -> cp<=0x7ff? cp<=0x7f? 1: 2: cp<=0xffff? 3: 4)
        .sum();
}

如果ASCII字符占主导地位,则使用起来可能会更快

If ASCII characters dominate the contents, it might be slightly faster to use

public static int utf8Length(CharSequence cs) {
    return cs.length()
         + cs.codePoints().filter(cp -> cp>0x7f).map(cp -> cp<=0x7ff? 1: 2).sum();
}

相反.

但是您也可以考虑不重新计算整个大小的优化潜力,而只是重新计算要添加到StringBuilder的新片段的大小,类似

But you may also consider the optimization potential of not recalculating the entire size, but only the size of the new fragment you’re appending to the StringBuilder, something alike

    StringBuilder sb = new StringBuilder();
    int length = 0;
    for(…; …; …) {
        String s = … //calculateNextString();
        sb.append(s);
        length += utf8Length(s);
        if(length >= 5242880) {
            // Do something

            // in case you're flushing the data:
            sb.setLength(0);
            length = 0;
        }
    }

这是假设,如果您要添加包含代理对的片段,则它们始终是完整的,不会分成两半.对于普通应用程序,应该总是这样.

This assumes that if you’re appending fragments containing surrogate pairs, they are always complete and not split into their halves. For ordinary applications, this should always be the case.

Didier-L 建议的另一种可能性是将计算推迟到StringBuilder达到如前所述,阈值的长度除以三,不可能UTF-8长度大于阈值.但是,只有在某些处决中您未达到threshold / 3的情况下,这才是有益的.

An additional possibility, suggested by Didier-L, is to postpone the calculation until your StringBuilder reaches a length of the threshold divided by three, as before that, it is impossible to have a UTF-8 length greater than the threshold. However, that will be only beneficial if it happens that you don’t reach threshold / 3 in some executions.

这篇关于Java-检查字符串大小的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆