迭代字符串中所有字符的最快方法 [英] Fastest way to iterate over all the chars in a String

查看:39
本文介绍了迭代字符串中所有字符的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Java 中,迭代字符串中所有字符的最快方法是什么:

String str = "一个非常非常长的字符串";for (int i = 0, n = str.length(); i 

或者这个:

char[] chars = str.toCharArray();for (int i = 0, n = chars.length; i < n; i++) {字符 c = 字符 [i];}

我想知道的是,在长时间迭代期间重复调用 charAt 方法的成本是否会小于或大于执行单次调用 的成本>toCharArray 开始,然后在迭代过程中直接访问数组.

如果有人能够为不同的字符串长度提供一个健壮的基准,那就太好了,考虑到 JIT 预热时间、JVM 启动时间等,而不仅仅是对 System 的两次调用之间的差异.currentTimeMillis().

解决方案

第一次更新:在您在生产环境中尝试之前(不建议),请先阅读以下内容:http://www.javaspecialists.eu/archive/Issue237.html从 Java 9 开始,所描述的解决方案将不再适用,因为现在 Java 默认将字符串存储为 byte[].

第二次更新:截至 2016 年 10 月 25 日,在我的 AMDx64 8core 和源 1.8 上,使用charAt"和字段访问之间没有区别.看来 jvm 已经足够优化,可以内联和简化任何string.charAt(n)"调用.

第三次更新:截至 2020 年 9 月 7 日,在我的 Ryzen 1950-X 16 内核和源 1.14 上,charAt1"比字段访问慢 9 倍,charAt2"比字段访问慢 4 倍.现场访问作为明显的赢家回来了.请注意,对于 Java 9+ 版本的 jvm,程序需要使用 byte[] 访问权限.

这完全取决于被检查的 String 的长度.如果,正如问题所说,它用于 long 字符串,检查字符串的最快方法是使用反射访问字符串的支持 char[].

在 64 AMD Phenom II 4 核 955 @ 3.2 GHZ(客户端模式和服务器模式)上使用 JDK 8(win32 和 win64)进行的完全随机基准测试,使用 9 种不同的技术(见下文!)表明使用 String.charAt(n) 对于小字符串来说是最快的,并且使用 reflection 访问 String 支持数组的速度几乎是大字符串的两倍.

实验

  • 尝试了 9 种不同的优化技术.

  • 所有字符串内容都是随机的

  • 测试是针对以 0、1、2、4、8、16 等开头的 2 倍数的字符串大小进行的.

  • 对每个字符串大小进行 1,000 次测试

  • 每次测试都会随机排列.换句话说,每次测试都是以随机顺序进行的,超过 1000 次.

  • 整个测试套件向前和向后完成,以显示 JVM 预热对优化和时间的影响.

  • 整个套件做了两次,一次在 -client 模式下,另一个在 -server 模式下.

结论

-客户端模式(32位)

对于长度为 1 到 256 个字符的字符串,调用 string.charAt(i) 胜出,平均每秒处理 1340 万到 588 万个字符.

此外,它的总体速度提高了 5.5%(客户端)和 13.9%(服务器),如下所示:

 for (int i = 0; i 

像这样使用局部最终长度变量:

 final int len = data.length();for (int i = 0; i < len; i++) {如果 (data.charAt(i) <= ' ') {投掷();}}

对于长字符串,512 到 256K 个字符长度,使用反射访问 String 的后备数组最快.这种技术的速度几乎是 String.charAt(i) 的两倍(快 178%).此范围内的平均速度为每秒 11.11 亿个字符.

字段必须提前获取,然后可以在库中的不同字符串上重新使用.有趣的是,与上面的代码不同,使用 Field 访问时,使用局部最终长度变量比在循环检查中使用 'chars.length' 快 9%.以下是如何以最快的速度设置现场访问:

 final Field field = String.class.getDeclaredField("value");field.setAccessible(true);尝试 {final char[] chars = (char[]) field.get(data);最终 int len = chars.length;for (int i = 0; i < len; i++) {如果(字符 [i] <= ' '){投掷();}}返回 len;} 捕捉(异常前){throw new RuntimeException(ex);}

-server 模式的特别说明

在我的 AMD 64 机器上的 64 位 Java 机器上的服务器模式下,字段访问在 32 个字符长度的字符串后开始获胜.直到客户端模式下的 512 个字符长度才看到.

另外值得注意的是,当我在服务器模式下运行 JDK 8(32 位构建)时,大字符串和小字符串的整体性能都降低了 7%.这是 JDK 8 早期版本的 2013 年 12 月 121 日版本.所以,目前看来,32 位服务器模式比 32 位客户端模式慢.

话虽如此……似乎唯一值得调用的服务器模式是在 64 位机器上.否则它实际上会妨碍性能.

对于在 AMD64 上以 -server 模式 运行的 32 位构建,我可以这样说:

  1. String.charAt(i) 是明显的赢家.尽管大小在 8 到 512 个字符之间,但在新"、重用"和字段"中还是有优胜者.
  2. String.charAt(i) 在客户端模式下快 45%
  3. 在客户端模式下,大字符串的字段访问速度是原来的两倍.

另外值得一提的是,String.chars()(Stream 和并行版本)是一个废品.比任何其他方式都慢.Streams API 是执行一般字符串操作的一种相当慢的方式.

愿望清单

Java String 可以具有接受优化方法的谓词,例如 contains(predicate)、forEach(consumer)、forEachWithIndex(consumer).因此,无需用户知道长度或重复调用 String 方法,这些可以帮助解析库beep-beep beep 加速.

继续做梦:)

快乐的弦乐!

~SH

测试使用了以下 9 种方法来测试字符串是否存在空格:

charAt1"-- 以通常方式检查字符串内容:

int charAtMethod1(final String data) {最终 int len = data.length();for (int i = 0; i < len; i++) {如果 (data.charAt(i) <= ' ') {投掷();}}返回 len;}

charAt2"-- 与上面相同,但使用 String.length() 而不是为 LENGTh 制作最终的本地 int

int charAtMethod2(final String data) {for (int i = 0; i < data.length(); i++) {如果 (data.charAt(i) <= ' ') {投掷();}}返回数据长度();}

流"-- 使用新的 JAVA-8 字符串的 IntStream 并传递它一个谓词来做检查

int streamMethod(final String data, final IntPredicate predicate) {如果(data.chars().anyMatch(谓词)){投掷();}返回数据长度();}

streamPara"-- 与上面相同,但 OH-LA-LA - 平行!!!

//不惜一切代价避免这种情况int streamParallelMethod(最终字符串数据,IntPredicate谓词){if (data.chars().parallel().anyMatch(predicate)) {投掷();}返回数据长度();}

重用"-- 用字符串内容重新填充可重复使用的 char[]

int重用BuffMethod(final char[] reusable, final String data) {最终 int len = data.length();data.getChars(0, len, reusable, 0);for (int i = 0; i < len; i++) {如果(可重用 [i] <= ' '){投掷();}}返回 len;}

new1"-- 从字符串中获取 char[] 的新副本

int newMethod1(final String data) {最终 int len = data.length();最终 char[] copy = data.toCharArray();for (int i = 0; i < len; i++) {如果(复制 [i] <= ' '){投掷();}}返回 len;}

new2"-- 同上,但使用FOR-EACH"

int newMethod2(final String data) {for (final char c : data.toCharArray()) {如果 (c <= ' ') {投掷();}}返回数据长度();}

字段1"- 想要!!获取字段以访问字符串的内部字符[]

int fieldMethod1(final Field field, final String data) {尝试 {final char[] chars = (char[]) field.get(data);最终 int len = chars.length;for (int i = 0; i < len; i++) {如果(字符 [i] <= ' '){投掷();}}返回 len;} 捕捉(异常前){throw new RuntimeException(ex);}}

字段2"-- 同上,但使用FOR-EACH"

int fieldMethod2(final Field field, final String data) {最终 char[] 字符;尝试 {chars = (char[]) field.get(data);} 捕捉(异常前){throw new RuntimeException(ex);}for (final char c : chars) {如果 (c <= ' ') {投掷();}}返回字符长度;}

客户端 -client 模式的复合结果(向前和向后测试组合)

注意:Java 32 位的 -client 模式和 Java 64 位的 -server 模式在我的 AMD64 机器上与下面相同.

Size WINNER charAt1 charAt2 流 streamPar 重用 new1 new2 field1 field21 个字符 77.0 72.0 462.0 584.0 127.5 89.5 86.0 159.5 165.02 字符 38.0 36.5 284.0 32712.5 57.5 48.3 50.3 89.0 91.54 个字符 19.5 18.5 458.6 3169.0 33.0 26.8 27.5 54.1 52.68 个字符 9.8 9.9 100.5 1370.9 17.3 14.4 15.0 26.9 26.416 个字符 6.1 6.5 73.4 857.0 8.4 8.2 8.3 13.6 13.532 个字符 3.9 3.7 54.8 428.9 5.0 4.9 4.7 7.0 7.264 个字符 2.7 2.6 48.2 232.9 3.0 3.2 3.3 3.9 4.0128 个字符 2.1 1.9 43.7 138.8 2.1 2.6 2.6 2.4 2.6256 个字符 1.9 1.6 42.4 90.6 1.7 2.1 2.1 1.7 1.8512 field1 1.7 1.4 40.6 60.5 1.4 1.9 1.9 1.3 1.41,024 场 1 1.6 1.4 40.0 45.6 1.2 1.9 2.1 1.0 1.22,048 场 1 1.6 1.3 40.0 36.2 1.2 1.8 1.7 0.9 1.14,096 场 1 1.6 1.3 39.7 32.6 1.2 1.8 1.7 0.9 1.08,192 场 1 1.6 1.3 39.6 30.5 1.2 1.8 1.7 0.9 1.016,384 场 1 1.6 1.3 39.8 28.4 1.2 1.8 1.7 0.8 1.032,768 场 1 1.6 1.3 40.0 26.7 1.3 1.8 1.7 0.8 1.065,536 场 1 1.6 1.3 39.8 26.3 1.3 1.8 1.7 0.8 1.0131,072 场 1 1.6 1.3 40.1 25.4 1.4 1.9 1.8 0.8 1.0262,144 场 1 1.6 1.3 39.6 25.2 1.5 1.9 1.9 0.8 1.0

服务器的复合结果 -server 模式(向前和向后测试组合)

注意:这是在 AMD64 上以服务器模式运行的 Java 32 位测试.Java 64位的服务器模式与客户端模式的Java 32位相同,只是字段访问在32个字符大小后开始获胜.

Size WINNER charAt1 charAt2 流 streamPar 重用 new1 new2 field1 field21 个字符 74.5 95.5 524.5 783.0 90.5 102.5 90.5 135.0 151.52 字符 48.5 53.0 305.0 30851.3 59.3 57.5 52.0 88.5 91.84 个字符 28.8 32.1 132.8 2465.1 37.6 33.9 32.3 49.0 47.08 新 2 18.0 18.6 63.4 1541.3 18.5 17.9 17.6 25.4 25.816 新 2 14.0 14.7 129.4 1034.7 12.5 16.2 12.0 16.0 16.632 新 2 7.8 9.1 19.3 431.5 8.1 7.0 6.7 7.9 8.764 重用 6.1 7.5 11.7 204.7 3.5 3.9 4.3 4.2 4.1128 重用 6.8 6.8 9.0 101.0 2.6 3.0 3.0 2.6 2.7256 field2 6.2 6.5 6.9 57.2 2.4 2.7 2.9 2.3 2.3512 重用 4.3 4.9 5.8 28.2 2.0 2.6 2.6 2.1 2.11,024 个字符 2.0 1.8 5.3 17.6 2.1 2.5 3.5 2.0 2.02,048 个字符 1.9 1.7 5.2 11.9 2.2 3.0 2.6 2.0 2.04,096 个字符 1.9 1.7 5.1 8.7 2.1 2.6 2.6 1.9 1.98,192 个字符At 1.9 1.7 5.1 7.6 2.2 2.5 2.6 1.9 1.916,384 个字符 1.9 1.7 5.1 6.9 2.2 2.5 2.5 1.9 1.932,768 个字符At 1.9 1.7 5.1 6.1 2.2 2.5 2.5 1.9 1.965,536 个字符 1.9 1.7 5.1 5.5 2.2 2.4 2.4 1.9 1.9131,072 个字符 1.9 1.7 5.1 5.4 2.3 2.5 2.5 1.9 1.9262,144 个字符在 1.9 1.7 5.1 5.1 2.3 2.5 2.5 1.9 1.9

完整的可运行程序代码

(要在 Java 7 及更早版本上进行测试,删除两个流测试)

import java.lang.reflect.Field;导入 java.util.ArrayList;导入 java.util.Collections;导入 java.util.List;导入 java.util.Random;导入 java.util.function.IntPredicate;/*** @author Saint Hill <http://stackoverflow.com/users/1584255/saint-hill>*/公共最终类 TestStrings {//我们不会测试长度超过 512KM 的字符串最终 int MAX_STRING_SIZE = 1024 * 256;//对于每个字符串大小,我们将进行所有测试//这很多次最终 int TRIES_PER_STRING_SIZE = 1000;public static void main(String[] args) 抛出异常 {new TestStrings().run();}void run() 抛出异常 {//将数据长度加倍,直到达到 MAX 个字符长//0,1,2,4,8,16,32,64,128,256 ...最终列表<整数>大小 = 新的 ArrayList();for (int n = 0; n <= MAX_STRING_SIZE; n = (n == 0 ? 1 : n * 2)) {尺寸.添加(n);}//创建随机数(用于测试的随机顺序)最终随机随机=新随机();System.out.println(检查每个字符的速度以纳秒为单位.");System.out.printf("==== FORWARDS (尝试每个大小:%s) ==== 
", TRIES_PER_STRING_SIZE);打印标题(TRIES_PER_STRING_SIZE,随机);对于(整数大小:大小){报告结果(大小,测试(大小,TRIES_PER_STRING_SIZE,随机));}//逆序或字符串大小Collections.reverse(sizes);System.out.println("");System.out.println(检查每个字符的速度以纳秒为单位.");System.out.printf("==== BACKWARDS (尝试每个大小:%s) ==== 
", TRIES_PER_STRING_SIZE);打印标题(TRIES_PER_STRING_SIZE,随机);对于(整数大小:大小){报告结果(大小,测试(大小,TRIES_PER_STRING_SIZE,随机));}}/////////检查内容的方法///一个字符串.始终检查///空格 (CHAR <=' ')////////检查字符串内容int charAtMethod1(最终字符串数据){最终 int len = data.length();for (int i = 0; i < len; i++) {如果 (data.charAt(i) <= ' ') {投掷();}}返回 len;}//同上,但使用 String.length()//而不是创建一个新的最终本地 intint charAtMethod2(最终字符串数据){for (int i = 0; i < data.length(); i++) {如果 (data.charAt(i) <= ' ') {投掷();}}返回数据长度();}//使用新的 Java-8 字符串的 IntStream//传递一个 PREDICATE 来做检查int streamMethod(最终字符串数据,最终 IntPredicate 谓词){如果(data.chars().anyMatch(谓词)){投掷();}返回数据长度();}//OH LA LA - 平行!!!int streamParallelMethod(最终字符串数据,IntPredicate谓词){if (data.chars().parallel().anyMatch(predicate)) {投掷();}返回数据长度();}//用内容重新填充可重复使用的 char[]//字符串的 char[]int重用BuffMethod(最终字符[]可重用,最终字符串数据){最终 int len = data.length();data.getChars(0, len, reusable, 0);for (int i = 0; i < len; i++) {如果(可重用 [i] <= ' '){投掷();}}返回 len;}//从 String 中获取 char[] 的新副本int newMethod1(最终字符串数据){最终 int len = data.length();最终 char[] copy = data.toCharArray();for (int i = 0; i < len; i++) {如果(复制 [i] <= ' '){投掷();}}返回 len;}//从 String 中获取 char[] 的新副本//但使用 FOR-EACHint newMethod2(最终字符串数据){for (final char c : data.toCharArray()) {如果 (c <= ' ') {投掷();}}返回数据长度();}//想要!//获取访问字符串的字段//内部字符[]int fieldMethod1(最终字段字段,最终字符串数据){尝试 {final char[] chars = (char[]) field.get(data);最终 int len = chars.length;for (int i = 0; i < len; i++) {如果(字符 [i] <= ' '){投掷();}}返回 len;} 捕捉(异常前){throw new RuntimeException(ex);}}//同上,但使用 FOR-EACHint fieldMethod2(最终字段字段,最终字符串数据){最终 char[] 字符;尝试 {chars = (char[]) field.get(data);} 捕捉(异常前){throw new RuntimeException(ex);}for (final char c : chars) {如果 (c <= ' ') {投掷();}}返回字符长度;}/**** 列出测试清单.我们将重复洗牌此列表的副本* 当我们重复这个测试时.** @param 数据* @返回*/列表makeTests(String data) 抛出异常 {//列出测试列表最终名单<Jobber>测试 = 新的 ArrayList();测试.添加(新工作人员(charAt1"){内部检查(){返回 charAtMethod1(data);}});测试.添加(新工作人员(charAt2"){内部检查(){返回 charAtMethod2(data);}});测试.添加(新工作人员(流"){最终 IntPredicate 谓词 = new IntPredicate() {公共布尔测试(整数值){返回值 <= ' ';}};内部检查(){返回流方法(数据,谓词);}});测试.添加(新的工作人员(streamPar"){最终 IntPredicate 谓词 = new IntPredicate() {公共布尔测试(整数值){返回值 <= ' ';}};内部检查(){返回流并行方法(数据,谓词);}});//可重用的 char[] 方法测试.添加(新工作人员(重用"){最终字符[] cbuff = 新字符[MAX_STRING_SIZE];内部检查(){返回重用BuffMethod(cbuff,数据);}});//从字符串中新建 char[]测试.添加(新工作人员(新1"){内部检查(){返回 newMethod1(data);}});//从字符串中新建 char[]测试.添加(新工作人员(新2"){内部检查(){返回 newMethod2(data);}});//使用反射进行字段访问测试.添加(新工作人员(field1"){最终字段;{field = String.class.getDeclaredField("value");field.setAccessible(true);}内部检查(){返回 fieldMethod1(field, data);}});//使用反射进行字段访问tests.add(new Jobber(field2") {最终字段;{field = String.class.getDeclaredField("value");field.setAccessible(true);}内部检查(){返回 fieldMethod2(field, data);}});返回测试;}/*** 我们使用这个类来跟踪测试结果*/抽象类 Jobber {最终字符串名称;长纳米;长字符;长跑;Jobber(字符串名称){this.name = 名称;}抽象 int check();最后的双 nanosPerChar() {double charsPerRun = 字符数/运行数;长 nanosPerRun = 纳米/运行;返回 charsPerRun == 0 ?nanosPerRun : nanosPerRun/charsPerRun;}最终无效运行(){运行++;长时间 = System.nanoTime();字符 += 检查();nanos += System.nanoTime() - 时间;}}//做一个随机字符 A-Z 的测试串private String makeTestString(int testSize, char start, char end) {随机 r = 新随机();char[] 数据 = 新字符 [testSize];for (int i = 0; i < data.length; i++) {data[i] = (char) (start + r.nextInt(end));}返回新字符串(数据);}//如果我们在字符串中发现非法字符,我们就会这样做公共无效 doThrow() {throw new RuntimeException("Bzzzt -- 非法字符!!");}/*** 1. 获取正确长度的随机字符串 2. 获取测试 (List) 3.* 重复执行测试,每次都洗牌*/列表测试(整数大小,整数尝试,随机随机)抛出异常{String data = makeTestString(size, 'A', 'Z');列表测试 = makeTests(数据);列表copy = new ArrayList<>(测试);而(尝试 - > 0){Collections.shuffle(复制,随机);for (Jobber ti : 复制) {ti.run();}}//检查以确保所有字符计数相同长时间运行 = tests.get(0).runs;long count = tests.get(0).chars;for (Jobber ti : 测试) {if (ti.runs !=runs && ti.chars != count) {throw new Exception(如果所有正确的算法,字符计数应该匹配");}}返回测试;}private void printHeadings(final int TRIES_PER_STRING_SIZE, final Random random) 抛出异常 {System.out.print(尺寸");for (Jobber ti : test(0, TRIES_PER_STRING_SIZE, random)) {System.out.printf(%9s", ti.name);}System.out.println("");}private void reportResults(int size, List 测试) {System.out.printf(%6d", size);for (Jobber ti : 测试) {System.out.printf("%,9.2f", ti.nanosPerChar());}System.out.println("");}}

In Java, what would the fastest way to iterate over all the chars in a String, this:

String str = "a really, really long string";
for (int i = 0, n = str.length(); i < n; i++) {
    char c = str.charAt(i);
}

Or this:

char[] chars = str.toCharArray();
for (int i = 0, n = chars.length; i < n; i++) {
    char c = chars[i];
}

EDIT :

What I'd like to know is if the cost of repeatedly calling the charAt method during a long iteration ends up being either less than or greater than the cost of performing a single call to toCharArray at the beginning and then directly accessing the array during the iteration.

It'd be great if someone could provide a robust benchmark for different string lengths, having in mind JIT warm-up time, JVM start-up time, etc. and not just the difference between two calls to System.currentTimeMillis().

解决方案

FIRST UPDATE: Before you try this ever in a production environment (not advised), read this first: http://www.javaspecialists.eu/archive/Issue237.html Starting from Java 9, the solution as described won't work anymore, because now Java will store strings as byte[] by default.

SECOND UPDATE: As of 2016-10-25, on my AMDx64 8core and source 1.8, there is no difference between using 'charAt' and field access. It appears that the jvm is sufficiently optimized to inline and streamline any 'string.charAt(n)' calls.

THIRD UPDATE: As of 2020-09-07, on my Ryzen 1950-X 16 core and source 1.14, 'charAt1' is 9 times slower than field access and 'charAt2' is 4 times slower than field access. Field access is back as the clear winner. Note than the program will need to use byte[] access for Java 9+ version jvms.

It all depends on the length of the String being inspected. If, as the question says, it is for long strings, the fastest way to inspect the string is to use reflection to access the backing char[] of the string.

A fully randomized benchmark with JDK 8 (win32 and win64) on an 64 AMD Phenom II 4 core 955 @ 3.2 GHZ (in both client mode and server mode) with 9 different techniques (see below!) shows that using String.charAt(n) is the fastest for small strings and that using reflection to access the String backing array is almost twice as fast for large strings.

THE EXPERIMENT

  • 9 different optimization techniques are tried.

  • All string contents are randomized

  • The test are done for string sizes in multiples of two starting with 0,1,2,4,8,16 etc.

  • The tests are done 1,000 times per string size

  • The tests are shuffled into random order each time. In other words, the tests are done in random order every time they are done, over 1000 times over.

  • The entire test suite is done forwards, and backwards, to show the effect of JVM warmup on optimization and times.

  • The entire suite is done twice, once in -client mode and the other in -server mode.

CONCLUSIONS

-client mode (32 bit)

For strings 1 to 256 characters in length, calling string.charAt(i) wins with an average processing of 13.4 million to 588 million characters per second.

Also, it is overall 5.5% faster (client) and 13.9% (server) like this:

    for (int i = 0; i < data.length(); i++) {
        if (data.charAt(i) <= ' ') {
            doThrow();
        }
    }

than like this with a local final length variable:

    final int len = data.length();
    for (int i = 0; i < len; i++) {
        if (data.charAt(i) <= ' ') {
            doThrow();
        }
    }

For long strings, 512 to 256K characters length, using reflection to access the String's backing array is fastest. This technique is almost twice as fast as String.charAt(i) (178% faster). The average speed over this range was 1.111 billion characters per second.

The Field must be obtained ahead of time and then it can be re-used in the library on different strings. Interestingly, unlike the code above, with Field access, it is 9% faster to have a local final length variable than to use 'chars.length' in the loop check. Here is how Field access can be setup as fastest:

   final Field field = String.class.getDeclaredField("value");
   field.setAccessible(true);

   try {
       final char[] chars = (char[]) field.get(data);
       final int len = chars.length;
       for (int i = 0; i < len; i++) {
           if (chars[i] <= ' ') {
               doThrow();
           }
       }
       return len;
   } catch (Exception ex) {
       throw new RuntimeException(ex);
   }

Special comments on -server mode

Field access starting winning after 32 character length strings in server mode on a 64 bit Java machine on my AMD 64 machine. That was not seen until 512 characters length in client mode.

Also worth noting I think, when I was running JDK 8 (32 bit build) in server mode, the overall performance was 7% slower for both large and small strings. This was with build 121 Dec 2013 of JDK 8 early release. So, for now, it seems that 32 bit server mode is slower than 32 bit client mode.

That being said ... it seems the only server mode that is worth invoking is on a 64 bit machine. Otherwise it actually hampers performance.

For 32 bit build running in -server mode on an AMD64, I can say this:

  1. String.charAt(i) is the clear winner overall. Although between sizes 8 to 512 characters there were winners among 'new' 'reuse' and 'field'.
  2. String.charAt(i) is 45% faster in client mode
  3. Field access is twice as fast for large Strings in client mode.

Also worth saying, String.chars() (Stream and the parallel version) are a bust. Way slower than any other way. The Streams API is a rather slow way to perform general string operations.

Wish List

Java String could have predicate accepting optimized methods such as contains(predicate), forEach(consumer), forEachWithIndex(consumer). Thus, without the need for the user to know the length or repeat calls to String methods, these could help parsing libraries beep-beep beep speedup.

Keep dreaming :)

Happy Strings!

~SH

The test used the following 9 methods of testing the string for the presence of whitespace:

"charAt1" -- CHECK THE STRING CONTENTS THE USUAL WAY:

int charAtMethod1(final String data) {
    final int len = data.length();
    for (int i = 0; i < len; i++) {
        if (data.charAt(i) <= ' ') {
            doThrow();
        }
    }
    return len;
}

"charAt2" -- SAME AS ABOVE BUT USE String.length() INSTEAD OF MAKING A FINAL LOCAL int FOR THE LENGTh

int charAtMethod2(final String data) {
    for (int i = 0; i < data.length(); i++) {
        if (data.charAt(i) <= ' ') {
            doThrow();
        }
    }
    return data.length();
}

"stream" -- USE THE NEW JAVA-8 String's IntStream AND PASS IT A PREDICATE TO DO THE CHECKING

int streamMethod(final String data, final IntPredicate predicate) {
    if (data.chars().anyMatch(predicate)) {
        doThrow();
    }
    return data.length();
}

"streamPara" -- SAME AS ABOVE, BUT OH-LA-LA - GO PARALLEL!!!

// avoid this at all costs
int streamParallelMethod(final String data, IntPredicate predicate) {
    if (data.chars().parallel().anyMatch(predicate)) {
        doThrow();
    }
    return data.length();
}

"reuse" -- REFILL A REUSABLE char[] WITH THE STRINGS CONTENTS

int reuseBuffMethod(final char[] reusable, final String data) {
    final int len = data.length();
    data.getChars(0, len, reusable, 0);
    for (int i = 0; i < len; i++) {
        if (reusable[i] <= ' ') {
            doThrow();
        }
    }
    return len;
}

"new1" -- OBTAIN A NEW COPY OF THE char[] FROM THE STRING

int newMethod1(final String data) {
    final int len = data.length();
    final char[] copy = data.toCharArray();
    for (int i = 0; i < len; i++) {
        if (copy[i] <= ' ') {
            doThrow();
        }
    }
    return len;
}

"new2" -- SAME AS ABOVE, BUT USE "FOR-EACH"

int newMethod2(final String data) {
    for (final char c : data.toCharArray()) {
        if (c <= ' ') {
            doThrow();
        }
    }
    return data.length();
}

"field1" -- FANCY!! OBTAIN FIELD FOR ACCESS TO THE STRING'S INTERNAL char[]

int fieldMethod1(final Field field, final String data) {
    try {
        final char[] chars = (char[]) field.get(data);
        final int len = chars.length;
        for (int i = 0; i < len; i++) {
            if (chars[i] <= ' ') {
                doThrow();
            }
        }
        return len;
    } catch (Exception ex) {
        throw new RuntimeException(ex);
    }
}

"field2" -- SAME AS ABOVE, BUT USE "FOR-EACH"

int fieldMethod2(final Field field, final String data) {
    final char[] chars;
    try {
        chars = (char[]) field.get(data);
    } catch (Exception ex) {
        throw new RuntimeException(ex);
    }
    for (final char c : chars) {
        if (c <= ' ') {
            doThrow();
        }
    }
    return chars.length;
}

COMPOSITE RESULTS FOR CLIENT -client MODE (forwards and backwards tests combined)

Note: that the -client mode with Java 32 bit and -server mode with Java 64 bit are the same as below on my AMD64 machine.

Size     WINNER  charAt1 charAt2  stream streamPar   reuse    new1    new2  field1  field2
1        charAt    77.0     72.0   462.0     584.0   127.5    89.5    86.0   159.5   165.0
2        charAt    38.0     36.5   284.0   32712.5    57.5    48.3    50.3    89.0    91.5
4        charAt    19.5     18.5   458.6    3169.0    33.0    26.8    27.5    54.1    52.6
8        charAt     9.8      9.9   100.5    1370.9    17.3    14.4    15.0    26.9    26.4
16       charAt     6.1      6.5    73.4     857.0     8.4     8.2     8.3    13.6    13.5
32       charAt     3.9      3.7    54.8     428.9     5.0     4.9     4.7     7.0     7.2
64       charAt     2.7      2.6    48.2     232.9     3.0     3.2     3.3     3.9     4.0
128      charAt     2.1      1.9    43.7     138.8     2.1     2.6     2.6     2.4     2.6
256      charAt     1.9      1.6    42.4      90.6     1.7     2.1     2.1     1.7     1.8
512      field1     1.7      1.4    40.6      60.5     1.4     1.9     1.9     1.3     1.4
1,024    field1     1.6      1.4    40.0      45.6     1.2     1.9     2.1     1.0     1.2
2,048    field1     1.6      1.3    40.0      36.2     1.2     1.8     1.7     0.9     1.1
4,096    field1     1.6      1.3    39.7      32.6     1.2     1.8     1.7     0.9     1.0
8,192    field1     1.6      1.3    39.6      30.5     1.2     1.8     1.7     0.9     1.0
16,384   field1     1.6      1.3    39.8      28.4     1.2     1.8     1.7     0.8     1.0
32,768   field1     1.6      1.3    40.0      26.7     1.3     1.8     1.7     0.8     1.0
65,536   field1     1.6      1.3    39.8      26.3     1.3     1.8     1.7     0.8     1.0
131,072  field1     1.6      1.3    40.1      25.4     1.4     1.9     1.8     0.8     1.0
262,144  field1     1.6      1.3    39.6      25.2     1.5     1.9     1.9     0.8     1.0

COMPOSITE RESULTS FOR SERVER -server MODE (forwards and backwards tests combined)

Note: this is the test for Java 32 bit running in server mode on an AMD64. The server mode for Java 64 bit was the same as Java 32 bit in client mode except that Field access starting winning after 32 characters size.

Size     WINNER  charAt1 charAt2  stream streamPar   reuse    new1    new2  field1  field2
1        charAt     74.5    95.5   524.5     783.0    90.5   102.5    90.5   135.0   151.5
2        charAt     48.5    53.0   305.0   30851.3    59.3    57.5    52.0    88.5    91.8
4        charAt     28.8    32.1   132.8    2465.1    37.6    33.9    32.3    49.0    47.0
8          new2     18.0    18.6    63.4    1541.3    18.5    17.9    17.6    25.4    25.8
16         new2     14.0    14.7   129.4    1034.7    12.5    16.2    12.0    16.0    16.6
32         new2      7.8     9.1    19.3     431.5     8.1     7.0     6.7     7.9     8.7
64        reuse      6.1     7.5    11.7     204.7     3.5     3.9     4.3     4.2     4.1
128       reuse      6.8     6.8     9.0     101.0     2.6     3.0     3.0     2.6     2.7
256      field2      6.2     6.5     6.9      57.2     2.4     2.7     2.9     2.3     2.3
512       reuse      4.3     4.9     5.8      28.2     2.0     2.6     2.6     2.1     2.1
1,024    charAt      2.0     1.8     5.3      17.6     2.1     2.5     3.5     2.0     2.0
2,048    charAt      1.9     1.7     5.2      11.9     2.2     3.0     2.6     2.0     2.0
4,096    charAt      1.9     1.7     5.1       8.7     2.1     2.6     2.6     1.9     1.9
8,192    charAt      1.9     1.7     5.1       7.6     2.2     2.5     2.6     1.9     1.9
16,384   charAt      1.9     1.7     5.1       6.9     2.2     2.5     2.5     1.9     1.9
32,768   charAt      1.9     1.7     5.1       6.1     2.2     2.5     2.5     1.9     1.9
65,536   charAt      1.9     1.7     5.1       5.5     2.2     2.4     2.4     1.9     1.9
131,072  charAt      1.9     1.7     5.1       5.4     2.3     2.5     2.5     1.9     1.9
262,144  charAt      1.9     1.7     5.1       5.1     2.3     2.5     2.5     1.9     1.9

FULL RUNNABLE PROGRAM CODE

(to test on Java 7 and earlier, remove the two streams tests)

import java.lang.reflect.Field;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Random;
import java.util.function.IntPredicate;

/**
 * @author Saint Hill <http://stackoverflow.com/users/1584255/saint-hill>
 */
public final class TestStrings {

    // we will not test strings longer than 512KM
    final int MAX_STRING_SIZE = 1024 * 256;

    // for each string size, we will do all the tests
    // this many times
    final int TRIES_PER_STRING_SIZE = 1000;

    public static void main(String[] args) throws Exception {
        new TestStrings().run();
    }

    void run() throws Exception {

        // double the length of the data until it reaches MAX chars long
        // 0,1,2,4,8,16,32,64,128,256 ... 
        final List<Integer> sizes = new ArrayList<>();
        for (int n = 0; n <= MAX_STRING_SIZE; n = (n == 0 ? 1 : n * 2)) {
            sizes.add(n);
        }

        // CREATE RANDOM (FOR SHUFFLING ORDER OF TESTS)
        final Random random = new Random();

        System.out.println("Rate in nanoseconds per character inspected.");
        System.out.printf("==== FORWARDS (tries per size: %s) ==== 
", TRIES_PER_STRING_SIZE);

        printHeadings(TRIES_PER_STRING_SIZE, random);

        for (int size : sizes) {
            reportResults(size, test(size, TRIES_PER_STRING_SIZE, random));
        }

        // reverse order or string sizes
        Collections.reverse(sizes);

        System.out.println("");
        System.out.println("Rate in nanoseconds per character inspected.");
        System.out.printf("==== BACKWARDS (tries per size: %s) ==== 
", TRIES_PER_STRING_SIZE);

        printHeadings(TRIES_PER_STRING_SIZE, random);

        for (int size : sizes) {
            reportResults(size, test(size, TRIES_PER_STRING_SIZE, random));

        }
    }

    ///
    ///
    ///  METHODS OF CHECKING THE CONTENTS
    ///  OF A STRING. ALWAYS CHECKING FOR
    ///  WHITESPACE (CHAR <=' ')
    ///  
    ///
    // CHECK THE STRING CONTENTS
    int charAtMethod1(final String data) {
        final int len = data.length();
        for (int i = 0; i < len; i++) {
            if (data.charAt(i) <= ' ') {
                doThrow();
            }
        }
        return len;
    }

    // SAME AS ABOVE BUT USE String.length()
    // instead of making a new final local int 
    int charAtMethod2(final String data) {
        for (int i = 0; i < data.length(); i++) {
            if (data.charAt(i) <= ' ') {
                doThrow();
            }
        }
        return data.length();
    }

    // USE new Java-8 String's IntStream
    // pass it a PREDICATE to do the checking
    int streamMethod(final String data, final IntPredicate predicate) {
        if (data.chars().anyMatch(predicate)) {
            doThrow();
        }
        return data.length();
    }

    // OH LA LA - GO PARALLEL!!!
    int streamParallelMethod(final String data, IntPredicate predicate) {
        if (data.chars().parallel().anyMatch(predicate)) {
            doThrow();
        }
        return data.length();
    }

    // Re-fill a resuable char[] with the contents
    // of the String's char[]
    int reuseBuffMethod(final char[] reusable, final String data) {
        final int len = data.length();
        data.getChars(0, len, reusable, 0);
        for (int i = 0; i < len; i++) {
            if (reusable[i] <= ' ') {
                doThrow();
            }
        }
        return len;
    }

    // Obtain a new copy of char[] from String
    int newMethod1(final String data) {
        final int len = data.length();
        final char[] copy = data.toCharArray();
        for (int i = 0; i < len; i++) {
            if (copy[i] <= ' ') {
                doThrow();
            }
        }
        return len;
    }

    // Obtain a new copy of char[] from String
    // but use FOR-EACH
    int newMethod2(final String data) {
        for (final char c : data.toCharArray()) {
            if (c <= ' ') {
                doThrow();
            }
        }
        return data.length();
    }

    // FANCY!
    // OBTAIN FIELD FOR ACCESS TO THE STRING'S
    // INTERNAL CHAR[]
    int fieldMethod1(final Field field, final String data) {
        try {
            final char[] chars = (char[]) field.get(data);
            final int len = chars.length;
            for (int i = 0; i < len; i++) {
                if (chars[i] <= ' ') {
                    doThrow();
                }
            }
            return len;
        } catch (Exception ex) {
            throw new RuntimeException(ex);
        }
    }

    // same as above but use FOR-EACH
    int fieldMethod2(final Field field, final String data) {
        final char[] chars;
        try {
            chars = (char[]) field.get(data);
        } catch (Exception ex) {
            throw new RuntimeException(ex);
        }
        for (final char c : chars) {
            if (c <= ' ') {
                doThrow();
            }
        }
        return chars.length;
    }

    /**
     *
     * Make a list of tests. We will shuffle a copy of this list repeatedly
     * while we repeat this test.
     *
     * @param data
     * @return
     */
    List<Jobber> makeTests(String data) throws Exception {
        // make a list of tests
        final List<Jobber> tests = new ArrayList<Jobber>();

        tests.add(new Jobber("charAt1") {
            int check() {
                return charAtMethod1(data);
            }
        });

        tests.add(new Jobber("charAt2") {
            int check() {
                return charAtMethod2(data);
            }
        });

        tests.add(new Jobber("stream") {
            final IntPredicate predicate = new IntPredicate() {
                public boolean test(int value) {
                    return value <= ' ';
                }
            };

            int check() {
                return streamMethod(data, predicate);
            }
        });

        tests.add(new Jobber("streamPar") {
            final IntPredicate predicate = new IntPredicate() {
                public boolean test(int value) {
                    return value <= ' ';
                }
            };

            int check() {
                return streamParallelMethod(data, predicate);
            }
        });

        // Reusable char[] method
        tests.add(new Jobber("reuse") {
            final char[] cbuff = new char[MAX_STRING_SIZE];

            int check() {
                return reuseBuffMethod(cbuff, data);
            }
        });

        // New char[] from String
        tests.add(new Jobber("new1") {
            int check() {
                return newMethod1(data);
            }
        });

        // New char[] from String
        tests.add(new Jobber("new2") {
            int check() {
                return newMethod2(data);
            }
        });

        // Use reflection for field access
        tests.add(new Jobber("field1") {
            final Field field;

            {
                field = String.class.getDeclaredField("value");
                field.setAccessible(true);
            }

            int check() {
                return fieldMethod1(field, data);
            }
        });

        // Use reflection for field access
        tests.add(new Jobber("field2") {
            final Field field;

            {
                field = String.class.getDeclaredField("value");
                field.setAccessible(true);
            }

            int check() {
                return fieldMethod2(field, data);
            }
        });

        return tests;
    }

    /**
     * We use this class to keep track of test results
     */
    abstract class Jobber {

        final String name;
        long nanos;
        long chars;
        long runs;

        Jobber(String name) {
            this.name = name;
        }

        abstract int check();

        final double nanosPerChar() {
            double charsPerRun = chars / runs;
            long nanosPerRun = nanos / runs;
            return charsPerRun == 0 ? nanosPerRun : nanosPerRun / charsPerRun;
        }

        final void run() {
            runs++;
            long time = System.nanoTime();
            chars += check();
            nanos += System.nanoTime() - time;
        }
    }

    // MAKE A TEST STRING OF RANDOM CHARACTERS A-Z
    private String makeTestString(int testSize, char start, char end) {
        Random r = new Random();
        char[] data = new char[testSize];
        for (int i = 0; i < data.length; i++) {
            data[i] = (char) (start + r.nextInt(end));
        }
        return new String(data);
    }

    // WE DO THIS IF WE FIND AN ILLEGAL CHARACTER IN THE STRING
    public void doThrow() {
        throw new RuntimeException("Bzzzt -- Illegal Character!!");
    }

    /**
     * 1. get random string of correct length 2. get tests (List<Jobber>) 3.
     * perform tests repeatedly, shuffling each time
     */
    List<Jobber> test(int size, int tries, Random random) throws Exception {
        String data = makeTestString(size, 'A', 'Z');
        List<Jobber> tests = makeTests(data);
        List<Jobber> copy = new ArrayList<>(tests);
        while (tries-- > 0) {
            Collections.shuffle(copy, random);
            for (Jobber ti : copy) {
                ti.run();
            }
        }
        // check to make sure all char counts the same
        long runs = tests.get(0).runs;
        long count = tests.get(0).chars;
        for (Jobber ti : tests) {
            if (ti.runs != runs && ti.chars != count) {
                throw new Exception("Char counts should match if all correct algorithms");
            }
        }
        return tests;
    }

    private void printHeadings(final int TRIES_PER_STRING_SIZE, final Random random) throws Exception {
        System.out.print("  Size");
        for (Jobber ti : test(0, TRIES_PER_STRING_SIZE, random)) {
            System.out.printf("%9s", ti.name);
        }
        System.out.println("");
    }

    private void reportResults(int size, List<Jobber> tests) {
        System.out.printf("%6d", size);
        for (Jobber ti : tests) {
            System.out.printf("%,9.2f", ti.nanosPerChar());
        }
        System.out.println("");
    }
}

这篇关于迭代字符串中所有字符的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆