在 for 循环中重新创建 ArrayList 的最快方法 [英] Fastest way to recreate the ArrayList in a for loop

查看:27
本文介绍了在 for 循环中重新创建 ArrayList 的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Java 中,对一个巨大的矩阵 X 使用以下函数来打印其列不同的元素:

In Java, using the following function for a huge matrix X to print its column-distinct elements:

// create the list of distinct values
List<Integer> values = new ArrayList<Integer>();

// X is n * m int[][] matrix
for (int j = 0, x; j < m; j++) {
    values.clear();
    for (int i = 0; i < n; i++) {
        x = X[i][j];
        if (values.contains(x)) continue;
        System.out.println(x);
        values.add(x);
    }
}

首先我按列(索引 j)和行内(索引 i)迭代.

First I iterate by columns (index j) and inside by rows (index i).

这个函数会针对不同的矩阵被调用数百万次,所以应该优化代码以满足性能要求.我想知道 values 数组.使用 values = new ArrayList();values = null 而不是 values.clear() 会更快吗?

This function will be called millions of times for different matrices, so the code should be optimized to meet the performance requirements. I'm wondering about the values array. Would it be faster to use values = new ArrayList<Integer>(); or values = null instead of values.clear() ?

推荐答案

使用 设置 而不是列表,例如 HashSet 实现.contains 方法将在 O(1) 中运行,而不是 O(n) 中的列表.只需调用 add 方法就可以节省一次调用.

What would be much more efficient would be to use a Set instead of a list, for example the HashSet implementation. The contains method will run in O(1) instead of O(n) with a list. And you could save one call by only calling the add method.

至于您的具体问题,我只会在每个循环中创建一个新的 Set - 创建对象并不那么昂贵,可能比清除该集要少(正如底部的基准所证实的那样 - 请参阅 EDIT 中最有效的版本2):

As for your specific question, I would just create a new Set at each loop - object creation is not that expensive, probably less than clearing the set (as confirmed by the benchmark at the bottom - see the most efficient version in EDIT 2):

for (int j = 0, x; j < m; j++) {
    Set<Integer> values = new HashSet<Integer>();
    for (int i = 0; i < n; i++) {
        x = X[i][j];
        if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
        System.out.println(x);
    }
}

但是,了解哪个更快(新对象与清晰)的唯一方法是分析代码的那部分并检查两个版本的性能.

However, the only way to know which is quicker (new object vs. clear) is to profile that portion of your code and check the performance of both versions.

编辑

我运行了一个快速基准测试,清晰版本似乎比在每个循环中创建一个集合快一点(大约 20%).您仍然应该检查您的数据集/用例哪个更好.使用我的数据集编写更快的代码:

I ran a quick benchmark and the clear version seems a little faster than creating a set at each loop (by about 20%). You should still check on your dataset / use case which one is better. Faster code with my dataset:

Set<Integer> values = new HashSet<Integer>();
for (int j = 0, x; j < m; j++) {
    for (int i = 0; i < n; i++) {
        x = X[i][j];
        if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
        System.out.println(x);
    }
    values.clear();
}

编辑 2

通过在每个循环中创建一组正确大小的新集合,可以获得实际上更快的代码版本:

An actually even faster version of the code is obtained by creating a new set of the right size at each loop:

for (int j = 0, x; j < m; j++) {
    Set<Integer> values = new HashSet<Integer>(n, 1); //right size from the beginning
    for (int i = 0; i < n; i++) {
        x = X[i][j];
        if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
        System.out.println(x);
    }
}

结果总结

JVM 预热 + JIT 后:

After JVM warm up + JIT:

Set<Integer> values = new HashSet<Integer>(n, 1); =====> 280 ms
values.clear();                                   =====> 380 ms
Set<Integer> values = new HashSet<Integer>();     =====> 450 ms 

这篇关于在 for 循环中重新创建 ArrayList 的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆