最快的方式来迭代字符串中的所有字符 [英] Fastest way to iterate over all the chars in a String

查看:131
本文介绍了最快的方式来迭代字符串中的所有字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Java中,以最快的方式迭代字符串中的所有字符,这是最快的方法:

  String str = 真的,真长串; 
for(int i = 0,n = str.length(); i char c = str.charAt(i);
}

或者:

  char [] chars = str.toCharArray(); 
for(int i = 0,n = chars.length; i char c = chars [i];
}

EDIT: b

我想知道的是,如果在长迭代过程中重复调用 charAt 方法的成本是小于或大于比起在开始时执行对 toCharArray 的单次调用,然后在迭代期间直接访问数组的成本。



如果有人可以为不同的字符串长度提供一个鲁棒的基准,考虑JIT预热时间,JVM启动时间等等,而不只是两个调用之间的区别, System.currentTimeMillis()

解决方案

第一次更新: (不建议),请先阅读: http://www.javaspecialists.eu/archive/Issue237。 html
从Java 9开始,所描述的解决方案将不再工作,因为现在的Java默认会将字符串存储为byte []。



第二更新:截至2016年10月25日,在我的AMDx64 8core和源1.8,使用'charAt'和字段访问没有区别。看起来jvm已经被充分优化以内联和简化任何'string.charAt(n)'调用。



这一切都取决于 String 。如果,如问题所说,它是字符串,检查字符串的最快方法是使用反射访问支持 char []



在64位AMD Phenom II 4核心955 @ 3.2 GHZ上使用JDK 8(win32和win64)的完全随机基准(在客户端模式和服务器模式)使用9种不同的技术(见下文!)表明,使用 String.charAt(n)是小字符串最快的,并且使用



实验









$ b

  • 尝试9种不同的优化技术。


  • 所有字符串内容都是随机的

    li>
  • 测试从0,1,2,4,8,16等开始以二的倍数进行。


  • 每个字符串大小测试1000次。


  • 每次测试随机排序。


  • 整个测试套件是向前执行,向后执行


  • 整个套件执行两次,一次在 -client

    code>模式,另一个在 -server 模式中。



$ b b

结论



- 客户模式(32位)



对于字符串 1到256字符长度,调用 string.charAt(i)的平均处理速度为每秒1340万到588万个字符。



此外,它的总体速度比客户端快5.5%(服务器)和5.5%:

  for(int i = 0; i  if(data.charAt(i)<=''){
doThrow
}
}

/ p>

  final int len = data.length(); 
for(int i = 0; i< len; i ++){
if(data.charAt(i)<=''){
doThrow
}
}

对于长字符串, 512到256K字符length ,使用反射来访问String的支持数组是最快的。 此技术几乎是String.charAt(i)的两倍快(178%更快)。在此范围内的平均速度为每秒11.1亿个字符。



字段必须提前获取,然后可以在库中对不同字符串重复使用。有趣的是,与上面的代码不同,使用Field访问,使用局部最终长度变量比在循环检查中使用'chars.length'快9%。下面是如何以最快的速度设置字段访问:

  final字段field = String.class.getDeclaredField(value); 
field.setAccessible(true);

try {
final char [] chars =(char [])field.get(data);
final int len = chars.length;
for(int i = 0; i< len; i ++){
if(chars [i] <=''){
doThrow
}
}
return len;
} catch(Exception ex){
throw new RuntimeException(ex);
}



对服务器模式的特殊注释



在AMD 64位机器上的64位Java机器上,在服务器模式下,字段访问开始获胜32个字符长度字符串。在客户端模式下,直到512个字符长度才会看到。



还有一点值得注意,当我在服务器模式下运行JDK 8(32位构建)大和小弦的性能都降低了7%。这是与构建2013年12月121日的JDK 8早期版本。所以,现在,似乎32位服务器模式慢于32位客户端模式。



这就是说...似乎唯一的服务器模式是值得的调用是在64位机器上。



对于AMD64上的 - 服务器模式运行的32位构建,我可以说这:


  1. String.charAt(i)是整体的明显胜利。虽然大小在8到512个字符之间,但在新重用和字段中有胜出。

  2. String.charAt
  3. 对于客户端模式下的大型字符串,字段访问速度是其两倍。

还值得一提的是,String.chars ()(流和并行版本)是一个胸围。方式比任何其他方式慢。 Streams API是执行常规字符串操作的相当慢的方法。



愿望清单



Java字符串可以有谓词接受优化的方法,如contains(谓词),forEach ),forEachWithIndex(consumer)。因此,无需用户知道长度或重复调用String方法,这些可以帮助解析库 beep-beep beep 加速。



继续做梦:)



Happy Strings!



〜SH



测试使用以下9种方法测试字符串是否存在空格:



charAt1 - CHECK字符串目前的方式:

  int charAtMethod1(final String data){
final int len = data.length ();
for(int i = 0; i< len; i ++){
if(data.charAt(i)<=''){
doThrow
}
}
return len;
}

charAt2 - SAME AS ABOVE BUT USE String.length

  int charAtMethod2(final String data){
for(int i = 0; i if(data.charAt(i)<=''){
doThrow
}
}
return data.length();
}

stream - 使用新的JAVA-8 String的IntStream和PASS

  int streamMethod(final String data,final IntPredicate predicate){
if(data。 chars()。anyMatch(predicate)){
doThrow();
}
return data.length();
}

streamPara - SAME AS ABOVE,BUT OH-LA-LA - GO paralleL !!!

  //不惜一切代价避免这种情况
int streamParallelMethod(final String data,IntPredicate predicate) {
if(data.chars()。parallel()。anyMatch(predicate)){
doThrow();
}
return data.length();
}

reuse - REFILL A REUSABLE char [] with the STRINGS CONTENTS

  int reuseBuffMethod(final char [] reusable,final String data){
final int len = data.length ;
data.getChars(0,len,reusable,0);
for(int i = 0; i if(reusable [i] <=''){
doThrow
}
}
return len;
}

new1 - 获取char []的新副本STRING

  int newMethod1(final String data){
final int len = data.length();
final char [] copy = data.toCharArray();
for(int i = 0; i if(copy [i] <=''){
doThrow
}
}
return len;
}

new2 - 与上述相同,

  int newMethod2(final String data){
for(final char c:data.toCharArray()){
if(c <=''){
doThrow();
}
}
return data.length();
}

field1 - FANCY!获取字段用于访问STRING的内部char []

  int fieldMethod1(最终字段字段,最终字符串数据){
try {
final char [] chars =(char [])field.get(data);
final int len = chars.length;
for(int i = 0; i if(chars [i] <=''){
doThrow
}
}
return len;
} catch(Exception ex){
throw new RuntimeException(ex);
}
}

field2 - 与上述相同,但使用FOR-EACH

  int fieldMethod2(final Field field,final String data){
final char [] chars ;
try {
chars =(char [])field.get(data);
} catch(Exception ex){
throw new RuntimeException(ex);
}
for(final char c:chars){
if(c <=''){
doThrow();
}
}
return chars.length;
}



客户端的组合结果 -client MODE(组合向前和向后测试)



注意:使用Java 32位的-client模式和使用Java 64位的服务器模式与以下相同在我的AMD64机器上。

 大小WINNER charAt1 charAt2 stream streamPar reuse new1 new2 field1 field2 
1 charAt 77.0 72.0 462.0 584.0 127.5 89.5 86.0 159.5 165.0
2 charAt 38.0 36.5 284.0 32712.5 57.5 48.3 50.3 89.0 91.5
4特征19.5 18.5 458.6 3169.0 33.0 26.8 27.5 54.1 52.6
8特征9.8 9.9 100.5 1370.9 17.3 14.4 15.0 26.9 26.4
16 charAt 6.1 6.5 73.4 857.0 8.4 8.2 8.3 13.6 13.5
32 charAt 3.9 3.7 54.8 428.9 5.0 4.9 4.7 7.0 7.2
64 charAt 2.7 2.6 48.2 232.9 3.0 3.2 3.3 3.9 4.0
128 charAt 2.1 1.9 43.7 138.8 2.1 2.6 2.6 2.4 2.6
256 charAt 1.9 1.6 42.4 90.6 1.7 2.1 2.1 1.7 1.8
512字段1 1.7 1.4 40.6 60.5 1.4 1.9 1.9 1.3 1.4
1,024字段1 1.6 1.4 40.0 45.6 1.2 1.9 2.1 1.0 1.2
2,048 field1 1.6 1.3 40.0 36.2 1.2 1.8 1.7 0.9 1.1
4,096 field1 1.6 1.3 39.7 32.6 1.2 1.8 1.7 0.9 1.0
8,192 field1 1.6 1.3 39.6 30.5 1.2 1.8 1.7 0.9 1.0
16,384 field1 1.6 1.3 39.8 28.4 1.2 1.8 1.7 0.8 1.0
32,768 field1 1.6 1.3 40.0 26.7 1.3 1.8 1.7 0.8 1.0
65,536 field1 1.6 1.3 39.8 26.3 1.3 1.8 1.7 0.8 1.0
131,072 field1 1.6 1.3 40.1 25.4 1.4 1.9 1.8 0.8 1.0
262,144 field1 1.6 1.3 39.6 25.2 1.5 1.9 1.9 0.8 1.0


$ b b

服务器的组合结果 -server MODE(组合正向和反向测试)



注意:测试在服务器模式下在AMD64上运行的32位Java 32位。 Java 64位的服务器模式与客户机模式中的Java 32位相同,除了字段访问开始在32个字符大小之后开始。

  Size WINNER charAt1 charAt2 stream streamPar reuse new1 new2 field1 field2 
1 charAt 74.5 95.5 524.5 783.0 90.5 102.5 90.5 135.0 151.5
2 charAt 48.5 53.0 305.0 30851.3 59.3 57.5 52.0 88.5 91.8
4 charAt 28.8 32.1 132.8 2465.1 37.6 33.9 32.3 49.0 47.0
8新2 18.0 18.6 63.4 1541.3 18.5 17.9 17.6 25.4 25.8
16新2 14.0 14.7 129.4 1034.7 12.5 16.2 12.0 16.0 16.6
32新2 7.8 9.1 19.3 431.5 8.1 7.0 6.7 7.9 8.7
64重复使用6.1 7.5 11.7 204.7 3.5 3.9 4.3 4.2 4.1
128重复使用6.8 6.8 9.0 101.0 2.6 3.0 3.0 2.6 2.7
256字段2 6.2 6.5 6.9 57.2 2.4 2.7 2.9 2.3 2.3
512重复使用4.3 4.9 5.8 28.2 2.0 2.6 2.6 2.1 2.1
1,024 charAt 2.0 1.8 5.3 17.6 2.1 2.5 3.5 2.0 2.0
2,048 charAt 1.9 1.7 5.2 11.9 2.2 3.0 2.6 2.0 2.0
4,096 charAt 1.9 1.7 5.1 8.7 2.1 2.6 2.6 1.9 1.9
8,192字符1.9 1.7 5.1 7.6 2.2 2.5 2.6 1.9 1.9
16,384字符1.9 1.7 5.1 6.9 2.2 2.5 2.5 1.9 1.9
32,768字符1.9 1.7 5.1 6.1 2.2 2.5 2.5 1.9 1.9
65,536 charAt 1.9 1.7 5.1 5.5 2.2 2.4 2.4 1.9 1.9
131,072 charAt 1.9 1.7 5.1 5.4 2.3 2.5 2.5 1.9 1.9
262,144 charAt 1.9 1.7 5.1 5.1 2.3 2.5 2.5 1.9 1.9 b $ b



FULL RUNNABLE PROGRAM CODE



Java 7和更早版本,删除两个流测试)

  import java.lang.reflect.Field; 
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Random;
import java.util.function.IntPredicate;

/ **
* @author Saint Hill< http://stackoverflow.com/users/1584255/saint-hill>
* /
public final class TestStrings {

//我们不会测试超过512KM的字符串
final int MAX_STRING_SIZE = 1024 * 256;

//对于每个字符串大小,我们将执行所有的测试
//这么多次
final int TRIES_PER_STRING_SIZE = 1000;

public static void main(String [] args)throws Exception {
new TestStrings()。run();
}

void run()throws Exception {

//将数据长度加倍,直到达到MAX chars long
// 0, 1,2,4,8,16,32,64,128,256 ...
final List< Integer> sizes = new ArrayList<>();
for(int n = 0; n <= MAX_STRING_SIZE; n =(n == 0?1:n * 2)){
sizes.add(n);
}

//创建随机(用于测试的顺序)
final随机随机= new Random();

System.out.println(以纳秒为单位检查的字符率);
System.out.printf(==== FORWARDS(每次尝试大小:%s)==== \\\
,TRIES_PER_STRING_SIZE);

printHeadings(TRIES_PER_STRING_SIZE,random);

for(int size:sizes){
reportResults(size,test(size,TRIES_PER_STRING_SIZE,random));
}

//反向顺序或字符串大小
Collections.reverse(sizes);

System.out.println();
System.out.println(以纳秒为单位检查的字符率);
System.out.printf(==== BACKWARDS(attempts per size:%s)==== \\\
,TRIES_PER_STRING_SIZE);

printHeadings(TRIES_PER_STRING_SIZE,random);

for(int size:sizes){
reportResults(size,test(size,TRIES_PER_STRING_SIZE,random));

}
}

///
///
///检查内容的方法
// / OF A STRING。 ALWAYS CHECKING FOR
/// WHITESPACE(CHAR <='')
///
///
//检查字符串内容
int charAtMethod1 final String data){
final int len = data.length();
for(int i = 0; i< len; i ++){
if(data.charAt(i)<=''){
doThrow
}
}
return len;
}

//与上一个相同使用String.length()
//而不是新的最终局部int
int charAtMethod2(final String data) {
for(int i = 0; i if(data.charAt(i)<=''){
doThrow ;
}
}
return data.length();
}

//使用新的Java-8 String的IntStream
//传递一个PREDICATE来做检查
int streamMethod(final String data,final IntPredicate predicate ){
if(data.chars()。anyMatch(predicate)){
doThrow();
}
return data.length();
}

// OH LA LA - GO PARALLEL!
int streamParallelMethod(final String data,IntPredicate predicate){
if(data.chars()。parallel()。anyMatch(predicate)){
doThrow
}
return data.length();
}

//使用内容
重新填充一个resuable char [] String的char []
的重复使用int重复使用BuffMethod(final char [] reusable ,final String data){
final int len = data.length();
data.getChars(0,len,reusable,0);
for(int i = 0; i if(reusable [i] <=''){
doThrow
}
}
return len;
}

//从String获取char []的新副本
int newMethod1(final String data){
final int len = data.length() ;
final char [] copy = data.toCharArray();
for(int i = 0; i if(copy [i] <=''){
doThrow
}
}
return len;
}

//从String
//获取char []的新副本,但使用FOR-EACH
int newMethod2(final String data){
for(final char c:data.toCharArray()){
if(c <=''){
doThrow();
}
}
return data.length();
}

// FANCY!
//获取STRING的字段
//内部CHAR []
int fieldMethod1(最终字段字段,最终字符串数据){
try {
final char [] chars =(char [])field.get(data);
final int len = chars.length;
for(int i = 0; i if(chars [i] <=''){
doThrow
}
}
return len;
} catch(Exception ex){
throw new RuntimeException(ex);
}
}

//同上,但使用FOR-EACH
int fieldMethod2(final Field field,final String data){
final char [] chars;
try {
chars =(char [])field.get(data);
} catch(Exception ex){
throw new RuntimeException(ex);
}
for(final char c:chars){
if(c <=''){
doThrow();
}
}
return chars.length;
}

/ **
*
*列出测试列表。我们将重复这个列表的副本
*,当我们重复这个测试。
*
* @param data
* @return
* /
List< Jobber> makeTests(String data)throws Exception {
//创建测试列表
final List< Jobber> tests = new ArrayList< Jobber>();

tests.add(new Jobber(charAt1){
int check(){
return charAtMethod1(data);
}
} ;

tests.add(new Jobber(charAt2){
int check(){
return charAtMethod2(data);
}
} ;

test.add(new Jobber(stream){
final IntPredicate predicate = new IntPredicate(){
public boolean test(int value){
return value< ='';
}
};

int check(){
return streamMethod(data,predicate);
}
});

test.add(new Jobber(streamPar){
final IntPredicate predicate = new IntPredicate(){
public boolean test(int value){
return value< ='';
}
};

int check(){
return streamParallelMethod(data,predicate);
}
});

//可重用的char []方法
tests.add(new Jobber(reuse){
final char [] cbuff = new char [MAX_STRING_SIZE];

int check(){
return reuseBuffMethod(cbuff,data);
}
});

//来自String的新的char []
tests.add(new Jobber(new1){
int check(){
return newMethod1(data) ;
}
});

//来自String的新的char []
tests.add(new Jobber(new2){
int check(){
return newMethod2(data) ;
}
});

//使用反射进行字段访问
tests.add(new Jobber(field1){
final字段字段;

{
field = String.class.getDeclaredField(value);
field.setAccessible(true);
}

int check(){
return fieldMethod1 (field,data);
}
});

//使用反射进行字段访问
tests.add(new Jobber(field2){
final字段字段;

{
field = String.class.getDeclaredField(value);
field.setAccessible(true);
}

int check(){
return fieldMethod2 (字段,数据);
}
});

返回测试;
}

/ **
*我们使用这个类来跟踪测试结果
* /
抽象类Jobber {

final String name;
long nanos;
long chars;
long run;

Jobber(String name){
this.name = name;
}

abstract int check();

final double nanosPerChar(){
double charsPerRun = chars / runs;
long nanosPerRun = nanos / runs;
return charsPerRun == 0? nanosPerRun:nanosPerRun / charsPerRun;
}

final void run(){
runs ++;
long time = System.nanoTime();
chars + = check();
nanos + = System.nanoTime() - time;
}
}

//测试随机字符串AZ
private String makeTestString(int testSize,char start,char end){
Random r = new Random();
char [] data = new char [testSize];
for(int i = 0; i data [i] =(char)(start + r.nextInt(end));
}
return new String(data);
}

//如果我们在字符串中发现一个非法字符,我们会这么做的
public void doThrow(){
throw new RuntimeException(Bzzzt - Illegal字符!!);
}

/ **
* 1.获取正确长度的随机字符串2. get tests(List< Jobber>)3.
*每次洗牌
* /
List< Jobber>测试(int size,int,随机随机)throws异常{
String data = makeTestString(size,'A','Z');
List< Jobber> tests = makeTests(data);
List< Jobber> copy = new ArrayList<>(tests);
while(attempts-> 0){
Collections.shuffle(copy,random);
(Jobber ti:copy){
ti.run();
}
}
//检查以确保所有字符计数相同
long runs = tests.get(0).runs;
long count = tests.get(0).chars;
for(Jobber ti:tests){
if(ti.runs!= runs& ti.chars!= count){
throw new Exception(Char counts should match if所有正确算法);
}
}
返回测试;
}

private void printHeadings(final int TRIES_PER_STRING_SIZE,final随机随机)throws Exception {
System.out.print(Size);
for(Jobber ti:test(0,TRIES_PER_STRING_SIZE,random)){
System.out.printf(%9s,ti.name);
}
System.out.println();
}

private void reportResults(int size,List< Jobber> tests){
System.out.printf(%6d,size);
for(Jobber ti:tests){
System.out.printf(%,9.2f,ti.nanosPerChar());
}
System.out.println();
}
}


In Java, what would the fastest way to iterate over all the chars in a String, this:

String str = "a really, really long string";
for (int i = 0, n = str.length(); i < n; i++) {
    char c = str.charAt(i);
}

Or this:

char[] chars = str.toCharArray();
for (int i = 0, n = chars.length; i < n; i++) {
    char c = chars[i];
}

EDIT :

What I'd like to know is if the cost of repeatedly calling the charAt method during a long iteration ends up being either less than or greater than the cost of performing a single call to toCharArray at the beginning and then directly accessing the array during the iteration.

It'd be great if someone could provide a robust benchmark for different string lengths, having in mind JIT warm-up time, JVM start-up time, etc. and not just the difference between two calls to System.currentTimeMillis().

解决方案

FIRST UPDATE: Before you try this ever in a production environment (not advised), read this first: http://www.javaspecialists.eu/archive/Issue237.html Starting from Java 9, the solution as described won't work anymore, because now Java will store strings as byte[] by default.

SECOND UPDATE: As of 2016-10-25, on my AMDx64 8core and source 1.8, there is no difference between using 'charAt' and field access. It appears that the jvm is sufficiently optimized to inline and streamline any 'string.charAt(n)' calls.

It all depends on the length of the String being inspected. If, as the question says, it is for long strings, the fastest way to inspect the string is to use reflection to access the backing char[] of the string.

A fully randomized benchmark with JDK 8 (win32 and win64) on an 64 AMD Phenom II 4 core 955 @ 3.2 GHZ (in both client mode and server mode) with 9 different techniques (see below!) shows that using String.charAt(n) is the fastest for small strings and that using reflection to access the String backing array is almost twice as fast for large strings.

THE EXPERIMENT

  • 9 different optimization techniques are tried.

  • All string contents are randomized

  • The test are done for string sizes in multiples of two starting with 0,1,2,4,8,16 etc.

  • The tests are done 1,000 times per string size

  • The tests are shuffled into random order each time. In other words, the tests are done in random order every time they are done, over 1000 times over.

  • The entire test suite is done forwards, and backwards, to show the effect of JVM warmup on optimization and times.

  • The entire suite is done twice, once in -client mode and the other in -server mode.

CONCLUSIONS

-client mode (32 bit)

For strings 1 to 256 characters in length, calling string.charAt(i) wins with an average processing of 13.4 million to 588 million characters per second.

Also, it is overall 5.5% faster (client) and 13.9% (server) like this:

    for (int i = 0; i < data.length(); i++) {
        if (data.charAt(i) <= ' ') {
            doThrow();
        }
    }

than like this with a local final length variable:

    final int len = data.length();
    for (int i = 0; i < len; i++) {
        if (data.charAt(i) <= ' ') {
            doThrow();
        }
    }

For long strings, 512 to 256K characters length, using reflection to access the String's backing array is fastest. This technique is almost twice as fast as String.charAt(i) (178% faster). The average speed over this range was 1.111 billion characters per second.

The Field must be obtained ahead of time and then it can be re-used in the library on different strings. Interestingly, unlike the code above, with Field access, it is 9% faster to have a local final length variable than to use 'chars.length' in the loop check. Here is how Field access can be setup as fastest:

   final Field field = String.class.getDeclaredField("value");
   field.setAccessible(true);

   try {
       final char[] chars = (char[]) field.get(data);
       final int len = chars.length;
       for (int i = 0; i < len; i++) {
           if (chars[i] <= ' ') {
               doThrow();
           }
       }
       return len;
   } catch (Exception ex) {
       throw new RuntimeException(ex);
   }

Special comments on -server mode

Field access starting winning after 32 character length strings in server mode on a 64 bit Java machine on my AMD 64 machine. That was not seen until 512 characters length in client mode.

Also worth noting I think, when I was running JDK 8 (32 bit build) in server mode, the overall performance was 7% slower for both large and small strings. This was with build 121 Dec 2013 of JDK 8 early release. So, for now, it seems that 32 bit server mode is slower than 32 bit client mode.

That being said ... it seems the only server mode that is worth invoking is on a 64 bit machine. Otherwise it actually hampers performance.

For 32 bit build running in -server mode on an AMD64, I can say this:

  1. String.charAt(i) is the clear winner overall. Although between sizes 8 to 512 characters there were winners among 'new' 'reuse' and 'field'.
  2. String.charAt(i) is 45% faster in client mode
  3. Field access is twice as fast for large Strings in client mode.

Also worth saying, String.chars() (Stream and the parallel version) are a bust. Way slower than any other way. The Streams API is a rather slow way to perform general string operations.

Wish List

Java String could have predicate accepting optimized methods such as contains(predicate), forEach(consumer), forEachWithIndex(consumer). Thus, without the need for the user to know the length or repeat calls to String methods, these could help parsing libraries beep-beep beep speedup.

Keep dreaming :)

Happy Strings!

~SH

The test used the following 9 methods of testing the string for the presence of whitespace:

"charAt1" -- CHECK THE STRING CONTENTS THE USUAL WAY:

int charAtMethod1(final String data) {
    final int len = data.length();
    for (int i = 0; i < len; i++) {
        if (data.charAt(i) <= ' ') {
            doThrow();
        }
    }
    return len;
}

"charAt2" -- SAME AS ABOVE BUT USE String.length() INSTEAD OF MAKING A FINAL LOCAL int FOR THE LENGTh

int charAtMethod2(final String data) {
    for (int i = 0; i < data.length(); i++) {
        if (data.charAt(i) <= ' ') {
            doThrow();
        }
    }
    return data.length();
}

"stream" -- USE THE NEW JAVA-8 String's IntStream AND PASS IT A PREDICATE TO DO THE CHECKING

int streamMethod(final String data, final IntPredicate predicate) {
    if (data.chars().anyMatch(predicate)) {
        doThrow();
    }
    return data.length();
}

"streamPara" -- SAME AS ABOVE, BUT OH-LA-LA - GO PARALLEL!!!

// avoid this at all costs
int streamParallelMethod(final String data, IntPredicate predicate) {
    if (data.chars().parallel().anyMatch(predicate)) {
        doThrow();
    }
    return data.length();
}

"reuse" -- REFILL A REUSABLE char[] WITH THE STRINGS CONTENTS

int reuseBuffMethod(final char[] reusable, final String data) {
    final int len = data.length();
    data.getChars(0, len, reusable, 0);
    for (int i = 0; i < len; i++) {
        if (reusable[i] <= ' ') {
            doThrow();
        }
    }
    return len;
}

"new1" -- OBTAIN A NEW COPY OF THE char[] FROM THE STRING

int newMethod1(final String data) {
    final int len = data.length();
    final char[] copy = data.toCharArray();
    for (int i = 0; i < len; i++) {
        if (copy[i] <= ' ') {
            doThrow();
        }
    }
    return len;
}

"new2" -- SAME AS ABOVE, BUT USE "FOR-EACH"

int newMethod2(final String data) {
    for (final char c : data.toCharArray()) {
        if (c <= ' ') {
            doThrow();
        }
    }
    return data.length();
}

"field1" -- FANCY!! OBTAIN FIELD FOR ACCESS TO THE STRING'S INTERNAL char[]

int fieldMethod1(final Field field, final String data) {
    try {
        final char[] chars = (char[]) field.get(data);
        final int len = chars.length;
        for (int i = 0; i < len; i++) {
            if (chars[i] <= ' ') {
                doThrow();
            }
        }
        return len;
    } catch (Exception ex) {
        throw new RuntimeException(ex);
    }
}

"field2" -- SAME AS ABOVE, BUT USE "FOR-EACH"

int fieldMethod2(final Field field, final String data) {
    final char[] chars;
    try {
        chars = (char[]) field.get(data);
    } catch (Exception ex) {
        throw new RuntimeException(ex);
    }
    for (final char c : chars) {
        if (c <= ' ') {
            doThrow();
        }
    }
    return chars.length;
}

COMPOSITE RESULTS FOR CLIENT -client MODE (forwards and backwards tests combined)

Note: that the -client mode with Java 32 bit and -server mode with Java 64 bit are the same as below on my AMD64 machine.

Size     WINNER  charAt1 charAt2  stream streamPar   reuse    new1    new2  field1  field2
1        charAt    77.0     72.0   462.0     584.0   127.5    89.5    86.0   159.5   165.0
2        charAt    38.0     36.5   284.0   32712.5    57.5    48.3    50.3    89.0    91.5
4        charAt    19.5     18.5   458.6    3169.0    33.0    26.8    27.5    54.1    52.6
8        charAt     9.8      9.9   100.5    1370.9    17.3    14.4    15.0    26.9    26.4
16       charAt     6.1      6.5    73.4     857.0     8.4     8.2     8.3    13.6    13.5
32       charAt     3.9      3.7    54.8     428.9     5.0     4.9     4.7     7.0     7.2
64       charAt     2.7      2.6    48.2     232.9     3.0     3.2     3.3     3.9     4.0
128      charAt     2.1      1.9    43.7     138.8     2.1     2.6     2.6     2.4     2.6
256      charAt     1.9      1.6    42.4      90.6     1.7     2.1     2.1     1.7     1.8
512      field1     1.7      1.4    40.6      60.5     1.4     1.9     1.9     1.3     1.4
1,024    field1     1.6      1.4    40.0      45.6     1.2     1.9     2.1     1.0     1.2
2,048    field1     1.6      1.3    40.0      36.2     1.2     1.8     1.7     0.9     1.1
4,096    field1     1.6      1.3    39.7      32.6     1.2     1.8     1.7     0.9     1.0
8,192    field1     1.6      1.3    39.6      30.5     1.2     1.8     1.7     0.9     1.0
16,384   field1     1.6      1.3    39.8      28.4     1.2     1.8     1.7     0.8     1.0
32,768   field1     1.6      1.3    40.0      26.7     1.3     1.8     1.7     0.8     1.0
65,536   field1     1.6      1.3    39.8      26.3     1.3     1.8     1.7     0.8     1.0
131,072  field1     1.6      1.3    40.1      25.4     1.4     1.9     1.8     0.8     1.0
262,144  field1     1.6      1.3    39.6      25.2     1.5     1.9     1.9     0.8     1.0

COMPOSITE RESULTS FOR SERVER -server MODE (forwards and backwards tests combined)

Note: this is the test for Java 32 bit running in server mode on an AMD64. The server mode for Java 64 bit was the same as Java 32 bit in client mode except that Field access starting winning after 32 characters size.

Size     WINNER  charAt1 charAt2  stream streamPar   reuse    new1    new2  field1  field2
1        charAt     74.5    95.5   524.5     783.0    90.5   102.5    90.5   135.0   151.5
2        charAt     48.5    53.0   305.0   30851.3    59.3    57.5    52.0    88.5    91.8
4        charAt     28.8    32.1   132.8    2465.1    37.6    33.9    32.3    49.0    47.0
8          new2     18.0    18.6    63.4    1541.3    18.5    17.9    17.6    25.4    25.8
16         new2     14.0    14.7   129.4    1034.7    12.5    16.2    12.0    16.0    16.6
32         new2      7.8     9.1    19.3     431.5     8.1     7.0     6.7     7.9     8.7
64        reuse      6.1     7.5    11.7     204.7     3.5     3.9     4.3     4.2     4.1
128       reuse      6.8     6.8     9.0     101.0     2.6     3.0     3.0     2.6     2.7
256      field2      6.2     6.5     6.9      57.2     2.4     2.7     2.9     2.3     2.3
512       reuse      4.3     4.9     5.8      28.2     2.0     2.6     2.6     2.1     2.1
1,024    charAt      2.0     1.8     5.3      17.6     2.1     2.5     3.5     2.0     2.0
2,048    charAt      1.9     1.7     5.2      11.9     2.2     3.0     2.6     2.0     2.0
4,096    charAt      1.9     1.7     5.1       8.7     2.1     2.6     2.6     1.9     1.9
8,192    charAt      1.9     1.7     5.1       7.6     2.2     2.5     2.6     1.9     1.9
16,384   charAt      1.9     1.7     5.1       6.9     2.2     2.5     2.5     1.9     1.9
32,768   charAt      1.9     1.7     5.1       6.1     2.2     2.5     2.5     1.9     1.9
65,536   charAt      1.9     1.7     5.1       5.5     2.2     2.4     2.4     1.9     1.9
131,072  charAt      1.9     1.7     5.1       5.4     2.3     2.5     2.5     1.9     1.9
262,144  charAt      1.9     1.7     5.1       5.1     2.3     2.5     2.5     1.9     1.9

FULL RUNNABLE PROGRAM CODE

(to test on Java 7 and earlier, remove the two streams tests)

import java.lang.reflect.Field;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Random;
import java.util.function.IntPredicate;

/**
 * @author Saint Hill <http://stackoverflow.com/users/1584255/saint-hill>
 */
public final class TestStrings {

    // we will not test strings longer than 512KM
    final int MAX_STRING_SIZE = 1024 * 256;

    // for each string size, we will do all the tests
    // this many times
    final int TRIES_PER_STRING_SIZE = 1000;

    public static void main(String[] args) throws Exception {
        new TestStrings().run();
    }

    void run() throws Exception {

        // double the length of the data until it reaches MAX chars long
        // 0,1,2,4,8,16,32,64,128,256 ... 
        final List<Integer> sizes = new ArrayList<>();
        for (int n = 0; n <= MAX_STRING_SIZE; n = (n == 0 ? 1 : n * 2)) {
            sizes.add(n);
        }

        // CREATE RANDOM (FOR SHUFFLING ORDER OF TESTS)
        final Random random = new Random();

        System.out.println("Rate in nanoseconds per character inspected.");
        System.out.printf("==== FORWARDS (tries per size: %s) ==== \n", TRIES_PER_STRING_SIZE);

        printHeadings(TRIES_PER_STRING_SIZE, random);

        for (int size : sizes) {
            reportResults(size, test(size, TRIES_PER_STRING_SIZE, random));
        }

        // reverse order or string sizes
        Collections.reverse(sizes);

        System.out.println("");
        System.out.println("Rate in nanoseconds per character inspected.");
        System.out.printf("==== BACKWARDS (tries per size: %s) ==== \n", TRIES_PER_STRING_SIZE);

        printHeadings(TRIES_PER_STRING_SIZE, random);

        for (int size : sizes) {
            reportResults(size, test(size, TRIES_PER_STRING_SIZE, random));

        }
    }

    ///
    ///
    ///  METHODS OF CHECKING THE CONTENTS
    ///  OF A STRING. ALWAYS CHECKING FOR
    ///  WHITESPACE (CHAR <=' ')
    ///  
    ///
    // CHECK THE STRING CONTENTS
    int charAtMethod1(final String data) {
        final int len = data.length();
        for (int i = 0; i < len; i++) {
            if (data.charAt(i) <= ' ') {
                doThrow();
            }
        }
        return len;
    }

    // SAME AS ABOVE BUT USE String.length()
    // instead of making a new final local int 
    int charAtMethod2(final String data) {
        for (int i = 0; i < data.length(); i++) {
            if (data.charAt(i) <= ' ') {
                doThrow();
            }
        }
        return data.length();
    }

    // USE new Java-8 String's IntStream
    // pass it a PREDICATE to do the checking
    int streamMethod(final String data, final IntPredicate predicate) {
        if (data.chars().anyMatch(predicate)) {
            doThrow();
        }
        return data.length();
    }

    // OH LA LA - GO PARALLEL!!!
    int streamParallelMethod(final String data, IntPredicate predicate) {
        if (data.chars().parallel().anyMatch(predicate)) {
            doThrow();
        }
        return data.length();
    }

    // Re-fill a resuable char[] with the contents
    // of the String's char[]
    int reuseBuffMethod(final char[] reusable, final String data) {
        final int len = data.length();
        data.getChars(0, len, reusable, 0);
        for (int i = 0; i < len; i++) {
            if (reusable[i] <= ' ') {
                doThrow();
            }
        }
        return len;
    }

    // Obtain a new copy of char[] from String
    int newMethod1(final String data) {
        final int len = data.length();
        final char[] copy = data.toCharArray();
        for (int i = 0; i < len; i++) {
            if (copy[i] <= ' ') {
                doThrow();
            }
        }
        return len;
    }

    // Obtain a new copy of char[] from String
    // but use FOR-EACH
    int newMethod2(final String data) {
        for (final char c : data.toCharArray()) {
            if (c <= ' ') {
                doThrow();
            }
        }
        return data.length();
    }

    // FANCY!
    // OBTAIN FIELD FOR ACCESS TO THE STRING'S
    // INTERNAL CHAR[]
    int fieldMethod1(final Field field, final String data) {
        try {
            final char[] chars = (char[]) field.get(data);
            final int len = chars.length;
            for (int i = 0; i < len; i++) {
                if (chars[i] <= ' ') {
                    doThrow();
                }
            }
            return len;
        } catch (Exception ex) {
            throw new RuntimeException(ex);
        }
    }

    // same as above but use FOR-EACH
    int fieldMethod2(final Field field, final String data) {
        final char[] chars;
        try {
            chars = (char[]) field.get(data);
        } catch (Exception ex) {
            throw new RuntimeException(ex);
        }
        for (final char c : chars) {
            if (c <= ' ') {
                doThrow();
            }
        }
        return chars.length;
    }

    /**
     *
     * Make a list of tests. We will shuffle a copy of this list repeatedly
     * while we repeat this test.
     *
     * @param data
     * @return
     */
    List<Jobber> makeTests(String data) throws Exception {
        // make a list of tests
        final List<Jobber> tests = new ArrayList<Jobber>();

        tests.add(new Jobber("charAt1") {
            int check() {
                return charAtMethod1(data);
            }
        });

        tests.add(new Jobber("charAt2") {
            int check() {
                return charAtMethod2(data);
            }
        });

        tests.add(new Jobber("stream") {
            final IntPredicate predicate = new IntPredicate() {
                public boolean test(int value) {
                    return value <= ' ';
                }
            };

            int check() {
                return streamMethod(data, predicate);
            }
        });

        tests.add(new Jobber("streamPar") {
            final IntPredicate predicate = new IntPredicate() {
                public boolean test(int value) {
                    return value <= ' ';
                }
            };

            int check() {
                return streamParallelMethod(data, predicate);
            }
        });

        // Reusable char[] method
        tests.add(new Jobber("reuse") {
            final char[] cbuff = new char[MAX_STRING_SIZE];

            int check() {
                return reuseBuffMethod(cbuff, data);
            }
        });

        // New char[] from String
        tests.add(new Jobber("new1") {
            int check() {
                return newMethod1(data);
            }
        });

        // New char[] from String
        tests.add(new Jobber("new2") {
            int check() {
                return newMethod2(data);
            }
        });

        // Use reflection for field access
        tests.add(new Jobber("field1") {
            final Field field;

            {
                field = String.class.getDeclaredField("value");
                field.setAccessible(true);
            }

            int check() {
                return fieldMethod1(field, data);
            }
        });

        // Use reflection for field access
        tests.add(new Jobber("field2") {
            final Field field;

            {
                field = String.class.getDeclaredField("value");
                field.setAccessible(true);
            }

            int check() {
                return fieldMethod2(field, data);
            }
        });

        return tests;
    }

    /**
     * We use this class to keep track of test results
     */
    abstract class Jobber {

        final String name;
        long nanos;
        long chars;
        long runs;

        Jobber(String name) {
            this.name = name;
        }

        abstract int check();

        final double nanosPerChar() {
            double charsPerRun = chars / runs;
            long nanosPerRun = nanos / runs;
            return charsPerRun == 0 ? nanosPerRun : nanosPerRun / charsPerRun;
        }

        final void run() {
            runs++;
            long time = System.nanoTime();
            chars += check();
            nanos += System.nanoTime() - time;
        }
    }

    // MAKE A TEST STRING OF RANDOM CHARACTERS A-Z
    private String makeTestString(int testSize, char start, char end) {
        Random r = new Random();
        char[] data = new char[testSize];
        for (int i = 0; i < data.length; i++) {
            data[i] = (char) (start + r.nextInt(end));
        }
        return new String(data);
    }

    // WE DO THIS IF WE FIND AN ILLEGAL CHARACTER IN THE STRING
    public void doThrow() {
        throw new RuntimeException("Bzzzt -- Illegal Character!!");
    }

    /**
     * 1. get random string of correct length 2. get tests (List<Jobber>) 3.
     * perform tests repeatedly, shuffling each time
     */
    List<Jobber> test(int size, int tries, Random random) throws Exception {
        String data = makeTestString(size, 'A', 'Z');
        List<Jobber> tests = makeTests(data);
        List<Jobber> copy = new ArrayList<>(tests);
        while (tries-- > 0) {
            Collections.shuffle(copy, random);
            for (Jobber ti : copy) {
                ti.run();
            }
        }
        // check to make sure all char counts the same
        long runs = tests.get(0).runs;
        long count = tests.get(0).chars;
        for (Jobber ti : tests) {
            if (ti.runs != runs && ti.chars != count) {
                throw new Exception("Char counts should match if all correct algorithms");
            }
        }
        return tests;
    }

    private void printHeadings(final int TRIES_PER_STRING_SIZE, final Random random) throws Exception {
        System.out.print("  Size");
        for (Jobber ti : test(0, TRIES_PER_STRING_SIZE, random)) {
            System.out.printf("%9s", ti.name);
        }
        System.out.println("");
    }

    private void reportResults(int size, List<Jobber> tests) {
        System.out.printf("%6d", size);
        for (Jobber ti : tests) {
            System.out.printf("%,9.2f", ti.nanosPerChar());
        }
        System.out.println("");
    }
}

这篇关于最快的方式来迭代字符串中的所有字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆