重构Java数组和原语(double [] [])到Collections和泛型(List< List< Double> [英] refactoring Java arrays and primitives (double[][]) to Collections and Generics (List<List<Double>>)

查看:132
本文介绍了重构Java数组和原语(double [] [])到Collections和泛型(List< List< Double>的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在重构一次性代码,我在几年前写了一个类似FORTRAN的风格。大多数代码现在更有条理和可读性。然而,算法的核心(性能关键)使用1维和2维Java数组,并且代表:

  for(int j = 1; j  int jj =(cont == BY_TYPE)? seq [1] [j-1]:j-1; 
for(int i = 1; i matrix [i] [j] = matrix [i-1] [j]
double m = matrix [i] [j-1] + gap;
if(m> matrix [i] [j]){
matrix [i] [j] = m;
pointers [i] [j] = UP;
}
// ...
}
}

为了清晰,可维护性和与其余代码的接口,我想重构它。但是,在阅读数组的Java通用语法 Java泛型和数字我有以下问题:




  • 性能。该代码计划使用约10 ^ 8 - 10 ^ 9秒/年,这只是可管理的。我的读数表明,将double改为Double有时可以在性能上增加3倍。我想在这方面的其他经验。我也期望从foo []移动到List也将是一个打击。我没有第一手的知识和再次的经验将是有用的。


  • 数组绑定检查。这是在双[]和列表中的不同处理和它有关系吗?我希望一些问题违反边界,因为算法相当简单,只应用于几个数据集。


  • 如果我不重构,那么代码具有丑陋和可能脆弱的两种方法的混合。我已经尝试写如下:



    列表< double []>和
    List< Double> []




,并且理解擦除不会使这个漂亮,最好产生编译器警告。似乎难以做到这一点没有非常复杂的结构。




  • 过时。一个海报建议Java数组应该被废弃。我假设这不会发生RSN,但我想摆脱过时的方法。



strong>到目前为止的共识:




  • 集合与原始数组相比具有显着的性能,特别是对于矩阵等结构。


  • 对于紧凑数值(科学)算法,数组符号[] []实际上更容易


  • 通用和数组不能很好地混合。




这是一个非常有用的方法。

QUESTION @SeanOwen建议从循环中取出不变的值是有用的。假设我没有goofed,这将看起来像:

  int len1 = len [1] 
int len0 = len [0];
int seq1 = seq [1];
int [] pointersi;
double [] matrixi;
for(int i = 1; i matrixi = matrix [i];
pointersi = pointers [i];
}
for(int j = 1; j int jj =(cont == BY_TYPE)? seq1 [j-1]:j-1;
for(int i = 1; i matrixi [j] = matrixi [j] + gap;
double m = matrixi [j-1] + gap;
if(m> matrixi [j]){
matrixi [j] = m;
pointersi [j] = UP;
}
// ...
}
}

我认为编译器是聪明的做这种事情。

解决方案

我读了一本由Kent Beck编写的最佳实践的书=http://rads.stackoverflow.com/amzn/click/B000XPRRVM =nofollow noreferrer> http://www.amazon.com/Implementation-Patterns/dp/B000XPRRVM )。还有有趣的性能数字。
具体来说,在数组和各种集合之间进行比较,数组真的要快得多(比起ArrayList,可能是x3)。



双重而不是双重,你需要坚持下去,不要使用双重,因为自动(非)拳击会杀死你的表现。



考虑到你的性能需求,






更多,我会计算一次用于循环中的条件的上限
这通常在循环之前完成。



然而,如果你不喜欢上界变量,只在循环中使用在循环之外访问,您可以利用for循环的初始化阶段,如下所示:

  for(int i = 0 ,max = list.size(); i  // do something 
}






我不相信java中数组的过时。对于性能关键的循环,我看不到任何语言设计器带走最快的选项(特别是如果差别是x3)。






我理解您对可维护性的关注,以及与应用程序其余部分的一致性。但我认为关键循环有权使用一些特殊的做法。



我会尽量让代码变得最清晰,不用改变它:




  • 通过仔细询问每个变量名称,最好用我的同事进行10分钟的头脑风暴会议

  • 通过编写编码注释(我反对他们的一般使用,因为不清楚的代码应该明确,而不是评论;但是一个关键循环)。

  • 根据需要使用私有方法(如Andreas_D在他的答案中指出)。如果做 private final ,机会是非常好的(因为他们会很短),他们将在运行时内联,所以在运行时没有性能影响。


I have been refactoring throwaway code which I wrote some years ago in a FORTRAN-like style. Most of the code is now much more organized and readable. However the heart of the algorithm (which is performance-critical) uses 1- and 2-dimensional Java arrays and is typified by:

    for (int j = 1; j < len[1]+1; j++) {
        int jj = (cont == BY_TYPE) ? seq[1][j-1] : j-1;
        for (int i = 1; i < len[0]+1; i++) {
            matrix[i][j] = matrix[i-1][j] + gap;
            double m = matrix[i][j-1] + gap;
            if (m > matrix[i][j]) {
                matrix[i][j] = m;
                pointers[i][j] = UP;
            }
            //...
        }
    }

For clarity, maintainability and interfacing with the rest of the code I would like to refactor it. However on reading Java Generics Syntax for arrays and Java Generics and numbers I have the following concerns:

  • Performance. The code is planned to use about 10^8 - 10^9 secs/yr and this is just about manageable. My reading suggests that changing double to Double can sometimes add a factor of 3 in performance. I'd like other experience on this. I would also expect that moving from foo[] to List would be a hit as well. I have no first-hand knowledge and again experience would be useful.

  • Array-bound checking. Is this treated differently in double[] and List and does it matter? I expect some problems to violate bounds as the algorithm is fairly simple and has only been applied to a few data sets.

  • If I don't refactor then the code has an ugly and possibly fragile intermixture of the two approaches. I am already trying to write things such as:

    List<double[]> and List<Double>[]

and understand that the erasure does not make this pretty and at best gives rise to compiler warnings. It seems difficult to do this without very convoluted constructs.

  • Obsolescence. One poster suggested that Java arrays should be obsoleted. I assume this isn't going to happen RSN but I would like to move away from outdated approaches.

SUMMARY The consensus so far:

  • Collections have a significant performance hit over primitive arrays, especially for constructs such as matrices. This is incurred in auto(un)boxing numerics and in accessing list items

  • For tight numerical (scientific) algorithms the array notation [][] is actually easier to read but the variables should named as helpfully as possible

  • Generics and arrays do not mix well. It may be useful to wrap the arrays in classes to transport them in/out of the tight algorithm.

There is little objective reason to make the change

QUESTION @SeanOwen has suggested that it would be useful to take constant values out of the loops. Assuming I haven't goofed this would look like:

 int len1 = len[1];
 int len0 = len[0];
 int seq1 = seq[1];
 int[] pointersi;
 double[] matrixi;
 for (int i = 1; i < len0+1; i++) {
     matrixi = matrix[i];
     pointersi = pointers[i];
 }
 for (int j = 1; j < len1+1; j++) {
    int jj = (cont == BY_TYPE) ? seq1[j-1] : j-1;
    for (int i = 1; i < len0+1; i++) {
        matrixi[j] = matrixi[j] + gap;
        double m = matrixi[j-1] + gap;
        if (m > matrixi[j]) {
            matrixi[j] = m;
            pointersi[j] = UP;
        }
        //...
    }
}

I thought compilers were meant to be smart at doing this sort of thing. Do we need to still do this?

解决方案

I read an excellent book by Kent Beck on coding best-practices ( http://www.amazon.com/Implementation-Patterns/dp/B000XPRRVM ). There are also interesting performance figures. Specifically, there are comparison between arrays and various collections., and arrays are really much faster (maybe x3 compared to ArrayList).

Also, if you use Double instead of double, you need to stick to it, and use no double, as auto(un)boxing will kill your performance.

Considering your performance need, I would stick to array of primitive type.


Even more, I would calculate only once the upper bound for the condition in loops. This is typically done the line before the loop.

However, if you don't like that the upper bound variable, used only in the loop, is accessible outside the loop, you can take advantage of the initialization phase of the for loop like this:

    for (int i=0, max=list.size(); i<max; i++) {
      // do something
    }


I don't believe in obsolescence for arrays in java. For performance-critical loop, I can't see any language designer taking away the fastest option (especially if the difference is x3).


I understand your concern for maintainability, and for coherence with the rest of the application. But I believe that a critical loop is entitled to some special practices.

I would try to make the code the clearest possible without changing it:

  • by carefully questionning each variable name, ideally with a 10-min brainstorming session with my collegues
  • by writing coding comments (I'm against their use in general, as a code that is not clear should be made clear, not commented ; but a critical loop justifies it).
  • by using private methods as needed (as Andreas_D pointed out in his answer). If made private final, chances are very good (as they would be short) that they will get inlined when running, so there would be no performance impact at runtime.

这篇关于重构Java数组和原语(double [] [])到Collections和泛型(List&lt; List&lt; Double&gt;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆