为什么Java的ArrayList的remove功能好像成本这么低? [英] Why does Java's ArrayList's remove function seem to cost so little?

查看:16
本文介绍了为什么Java的ArrayList的remove功能好像成本这么低?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个函数可以处理一个非常大的列表,超过大约 250,000 个项目.对于这些项目中的大多数,它只是替换位置 x 的项目.但是,对于其中大约 5% 的人,它必须将它们从列表中删除.

I have a function which manipulates a very large list, exceeding about 250,000 items. For the majority of those items, it simply replaces the item at position x. However, for about 5% of them, it must remove them from the list.

使用 LinkedList 似乎是避免代价高昂的删除的最明显的解决方案.然而,自然地,随着时间的推移,通过索引访问 LinkedList 变得越来越慢.这里的成本是几分钟(而且很多).

Using a LinkedList seemed to be the most obvious solution to avoid expensive removals. However, naturally, accessing a LinkedList by index becomes increasingly slow as time goes on. The cost here is minutes (and a lot of them).

在该 LinkedList 上使用迭代器也很昂贵,因为我似乎需要一个单独的副本来避免在编辑该列表时出现迭代器并发问题.这里的费用是几分钟.

Using an Iterator over that LinkedList is also expensive, as I appear to need a separate copy to avoid Iterator concurrency issues while editing that list. The cost here is minutes.

然而,这就是我的想法有点出乎意料的地方.如果我更改为 ArrayList,它几乎立即运行.

However, here's where my mind is blown a bit. If I change to an ArrayList, it runs almost instantly.

对于包含 297515 个元素的列表,删除 11958 个元素并修改其他所有内容需要 909 毫秒.我验证了结果列表的大小确实是 285557,正如预期的那样,并且包含我需要的更新信息.

For a list with 297515 elements, removing 11958 elements and modifying everything else takes 909ms. I verified that the resulting list is indeed 285557 in size, as expected, and contains the updated information I need.

为什么这么快?我查看了 JDK6 中 ArrayList 的源代码,它似乎按预期使用了 arraycopy 函数.我很想理解为什么 ArrayList 在这里工作得如此好,而常识似乎表明用于此任务的数组是一个糟糕的主意,需要移动数十万个项目.

Why is this so fast? I looked at the source for ArrayList in JDK6 and it appears to be using an arraycopy function as expected. I would love to understand why an ArrayList works so well here when common sense would seem to indicate that an array for this task is an awful idea, requiring shifting several hundred thousand items.

推荐答案

我运行了一个基准测试,尝试了以下每种策略来过滤列表元素:

I ran a benchmark, trying each of the following strategies for filtering the list elements:

  • 将需要的元素复制到新列表中
  • 使用 Iterator.remove()ArrayList
  • 中删除不需要的元素
  • 使用 Iterator.remove()LinkedList
  • 中删除不需要的元素
  • 就地压缩列表(将想要的元素移动到较低的位置)
  • ArrayList
  • 上按索引删除 (List.remove(int))
  • LinkedList
  • 上按索引删除 (List.remove(int))
  • Copy the wanted elements into a new list
  • Use Iterator.remove() to remove the unwanted elements from an ArrayList
  • Use Iterator.remove() to remove the unwanted elements from a LinkedList
  • Compact the list in-place (moving the wanted elements to lower positions)
  • Remove by index (List.remove(int)) on an ArrayList
  • Remove by index (List.remove(int)) on a LinkedList

每次我用 100000 个 Point 随机实例填充列表并使用过滤条件(基于哈希码),该条件将接受 95% 的元素并拒绝剩余的 5%(相同问题中所述的比例,但列表较小,因为我没有时间对 250000 个元素进行测试.)

Each time I populated the list with 100000 random instances of Point and used a filter condition (based on the hash code) that would accept 95% of elements and reject the remaining 5% (the same proportion stated in the question, but with a smaller list because I didn't have time to run the test for 250000 elements.)

平均时间(在我的旧 MacBook Pro 上:Core 2 Duo,2.2GHz,3Gb RAM)是:

And the average times (on my old MacBook Pro: Core 2 Duo, 2.2GHz, 3Gb RAM) were:

CopyIntoNewListWithIterator   :      4.24ms
CopyIntoNewListWithoutIterator:      3.57ms
FilterLinkedListInPlace       :      4.21ms
RandomRemoveByIndex           :    312.50ms
SequentialRemoveByIndex       :  33632.28ms
ShiftDown                     :      3.75ms

因此,从 LinkedList 中按索引删除元素比从 ArrayList 中删除元素要贵 300 多倍,而且可能比从 ArrayList 中删除要贵 6000-10000 倍其他方法(避免线性搜索和arraycopy)

So removing elements by index from a LinkedList was more than 300 times more expensive than removing them from an ArrayList, and probably somewhere between 6000-10000 times more expensive than the other methods (that avoid linear search and arraycopy)

这四种更快的方法之间似乎没有太大区别,但我再次使用包含 500000 个元素的列表再次运行了这四种方法,结果如下:

Here there doesn't seem to be much difference between the four faster methods, but I ran just those four again with a 500000-element list with the following results:

CopyIntoNewListWithIterator   :     92.49ms
CopyIntoNewListWithoutIterator:     71.77ms
FilterLinkedListInPlace       :     15.73ms
ShiftDown                     :     11.86ms

我猜,随着缓存容量的增加,缓存内存会成为限制因素,因此创建列表的第二个副本的成本变得很高.

I'm guessing that with the larger size cache memory becomes the limiting factor, so the cost of creating a second copy of the list becomes significant.

代码如下:

import java.awt.Point;
import java.security.SecureRandom;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.TreeMap;

public class ListBenchmark {

    public static void main(String[] args) {
        Random rnd = new SecureRandom();
        Map<String, Long> timings = new TreeMap<String, Long>();
        for (int outerPass = 0; outerPass < 10; ++ outerPass) {
            List<FilterStrategy> strategies =
                Arrays.asList(new CopyIntoNewListWithIterator(),
                              new CopyIntoNewListWithoutIterator(),
                              new FilterLinkedListInPlace(),
                              new RandomRemoveByIndex(),
                              new SequentialRemoveByIndex(),
                              new ShiftDown());
            for (FilterStrategy strategy: strategies) {
                String strategyName = strategy.getClass().getSimpleName();
                for (int innerPass = 0; innerPass < 10; ++ innerPass) {
                    strategy.populate(rnd);
                    if (outerPass >= 5 && innerPass >= 5) {
                        Long totalTime = timings.get(strategyName);
                        if (totalTime == null) totalTime = 0L;
                        timings.put(strategyName, totalTime - System.currentTimeMillis());
                    }
                    Collection<Point> filtered = strategy.filter();
                    if (outerPass >= 5 && innerPass >= 5) {
                        Long totalTime = timings.get(strategyName);
                        timings.put(strategy.getClass().getSimpleName(), totalTime + System.currentTimeMillis());
                    }
                    CHECKSUM += filtered.hashCode();
                    System.err.printf("%-30s %d %d %d%n", strategy.getClass().getSimpleName(), outerPass, innerPass, filtered.size());
                    strategy.clear();
                }
            }
        }
        for (Map.Entry<String, Long> e: timings.entrySet()) {
            System.err.printf("%-30s: %9.2fms%n", e.getKey(), e.getValue() * (1.0/25.0));
        }
    }

    public static volatile int CHECKSUM = 0;

    static void populate(Collection<Point> dst, Random rnd) {
        for (int i = 0; i < INITIAL_SIZE; ++ i) {
            dst.add(new Point(rnd.nextInt(), rnd.nextInt()));
        }
    }

    static boolean wanted(Point p) {
        return p.hashCode() % 20 != 0;
    }

    static abstract class FilterStrategy {
        abstract void clear();
        abstract Collection<Point> filter();
        abstract void populate(Random rnd);
    }

    static final int INITIAL_SIZE = 100000;

    private static class CopyIntoNewListWithIterator extends FilterStrategy {
        public CopyIntoNewListWithIterator() {
            list = new ArrayList<Point>(INITIAL_SIZE);
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            ArrayList<Point> dst = new ArrayList<Point>(list.size());
            for (Point p: list) {
                if (wanted(p)) dst.add(p);
            }
            return dst;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final ArrayList<Point> list;
    }

    private static class CopyIntoNewListWithoutIterator extends FilterStrategy {
        public CopyIntoNewListWithoutIterator() {
            list = new ArrayList<Point>(INITIAL_SIZE);
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            int inputSize = list.size();
            ArrayList<Point> dst = new ArrayList<Point>(inputSize);
            for (int i = 0; i < inputSize; ++ i) {
                Point p = list.get(i);
                if (wanted(p)) dst.add(p);
            }
            return dst;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final ArrayList<Point> list;
    }

    private static class FilterLinkedListInPlace extends FilterStrategy {
        public String toString() {
            return getClass().getSimpleName();
        }
        FilterLinkedListInPlace() {
            list = new LinkedList<Point>();
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            for (Iterator<Point> it = list.iterator();
                 it.hasNext();
                 ) {
                Point p = it.next();
                if (! wanted(p)) it.remove();
            }
            return list;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final LinkedList<Point> list;
    }

    private static class RandomRemoveByIndex extends FilterStrategy {
        public RandomRemoveByIndex() {
            list = new ArrayList<Point>(INITIAL_SIZE);
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            for (int i = 0; i < list.size();) {
                if (wanted(list.get(i))) {
                    ++ i;
                } else {
                    list.remove(i);
                }
            }
            return list;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final ArrayList<Point> list;
    }

    private static class SequentialRemoveByIndex extends FilterStrategy {
        public SequentialRemoveByIndex() {
            list = new LinkedList<Point>();
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            for (int i = 0; i < list.size();) {
                if (wanted(list.get(i))) {
                    ++ i;
                } else {
                    list.remove(i);
                }
            }
            return list;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final LinkedList<Point> list;
    }

    private static class ShiftDown extends FilterStrategy {
        public ShiftDown() {
            list = new ArrayList<Point>();
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            int inputSize = list.size();
            int outputSize = 0;
            for (int i = 0; i < inputSize; ++ i) {
                Point p = list.get(i);
                if (wanted(p)) {
                    list.set(outputSize++, p);
                }
            }
            list.subList(outputSize, inputSize).clear();
            return list;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final ArrayList<Point> list;
    }

}

这篇关于为什么Java的ArrayList的remove功能好像成本这么低?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆