如何比较两个巨大的List< String>在Java中? [英] How to compare two huge List<String> in Java?

查看:72
本文介绍了如何比较两个巨大的List< String>在Java中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的应用程序生成2个大列表(最多3.5mill个字符串记录).我需要最好,最快的方法进行比较.目前,我正在这样做:

My application generates 2 big lists (up to 3.5mill string records). I need the best and fastest way to compare it. Currently I am doing it like this:

List list1 = ListUtils.subtract(sourceDbResults, hiveResults);
List list2 = ListUtils.subtract(hiveResults, sourceDbResults);

但是,正如我从jconsole看到的那样,此方法在内存上确实非常昂贵,有时甚至可以在其上处理堆栈.有什么好的解决方案或想法吗?

But this method is really expensive on memory as i see from jconsole and sometimes process even stack on it. Any good solutions or ideas?

列表中的元素位置/顺序总是相同的,因此我不需要处理它.比较之后,我需要知道列表是否相同,如果这些列表不相同,则需要从这些列表中获取差异.减法非常适合小清单.

Element positions/order in the list are always the same, so I dont need to deal with it. After comparing I need to know if the list are the same and to get the differences from these list if they are not the same. Subtract works perfect for small lists.

推荐答案

鉴于您已经说过您的两个列表已经排序,可以在O(N)时间进行比较,这比当前解决方案要快得多.使用ListUtils.下面的方法使用一种与合并大多数教科书中可以找到的两个排序列表的算法相似的算法来实现此目的.

Given that you've said your two lists are already sorted, they can be compared in O(N) time, which is much faster than your current solution that uses ListUtils. The following method does this using a similar algorithm to the one that merges two sorted lists that can be found in most textbooks.

import java.util.*;

public class CompareSortedLists {
    public static void main(String[] args) {
        List<Integer> sourceDbResults = Arrays.asList(1, 2, 3, 4, 5, 8);
        List<Integer> hiveResults = Arrays.asList(2, 3, 6, 7);
        List<Integer> inSourceDb_notInHive = new ArrayList<>();
        List<Integer> inHive_notInSourceDb = new ArrayList<>();

        compareSortedLists(
                sourceDbResults, hiveResults,
                inSourceDb_notInHive, inHive_notInSourceDb);

        assert inSourceDb_notInHive.equals(Arrays.asList(1, 4, 5, 8));
        assert inHive_notInSourceDb.equals(Arrays.asList(6, 7));
    }

    /**
     * Compares two sorted lists (or other iterable collections in ascending order).
     * Adds to onlyInList1 any and all elements in list1 that are not in list2; and
     * conversely to onlyInList2. The caller must ensure the two input lists are
     * already sorted and should initialize onlyInList1 and onlyInList2 to empty,
     * writable collections.
     */
    public static <T extends Comparable<? super T>> void compareSortedLists(
            Iterable<T> list1, Iterable<T> list2,
            Collection<T> onlyInList1, Collection<T> onlyInList2) {
        Iterator<T> it1 = list1.iterator();
        Iterator<T> it2 = list2.iterator();
        T e1 = it1.hasNext() ? it1.next() : null;
        T e2 = it2.hasNext() ? it2.next() : null;
        while (e1 != null || e2 != null) {
            if (e2 == null) {  // No more elements in list2, some remaining in list1
                onlyInList1.add(e1);
                e1 = it1.hasNext() ? it1.next() : null;
            }
            else if (e1 == null) {  // No more elements in list1, some remaining in list2
                onlyInList2.add(e2);
                e2 = it2.hasNext() ? it2.next() : null;
            }
            else {
                int comp = e1.compareTo(e2);
                if (comp < 0) {
                    onlyInList1.add(e1);
                    e1 = it1.hasNext() ? it1.next() : null;
                }
                else if (comp > 0) {
                    onlyInList2.add(e2);
                    e2 = it2.hasNext() ? it2.next() : null;
                }
                else /* comp == 0 */ {
                    e1 = it1.hasNext() ? it1.next() : null;
                    e2 = it2.hasNext() ? it2.next() : null;
                }
            }
        }
    }
}

以上方法不使用任何外部库,并且可以与6以上版本的任何Java版本一起使用.如果您使用PeekingIterator(例如Apache Commons Collections或Guava的PeekingIterator)或编写自己的方法,则可以使该方法更简单,尤其是在您还使用Java 8的情况下:

The above method uses no external libraries, and can be used with any version of Java from 6 upwards. If you use a PeekingIterator, such as the one from Apache Commons Collections, or Guava, or write your own, then you can make the method simpler, especially if you also use Java 8:

public static <T extends Comparable<? super T>> void compareSortedLists(
        Iterable<T> list1, Iterable<T> list2,
        Collection<T> onlyInList1, Collection<T> onlyInList2) {
    PeekingIterator<T> it1 = new PeekingIterator<>(list1.iterator());
    PeekingIterator<T> it2 = new PeekingIterator<>(list2.iterator());
    while (it1.hasNext() && it2.hasNext()) {
        int comp = it1.peek().compareTo(it2.peek());
        if (comp < 0)
            onlyInList1.add(it1.next());
        else if (comp > 0)
            onlyInList2.add(it2.next());
        else /* comp == 0 */ {
            it1.next();
            it2.next();
        }
    }
    it1.forEachRemaining(onlyInList1::add);
    it2.forEachRemaining(onlyInList2::add);
}

这篇关于如何比较两个巨大的List&lt; String&gt;在Java中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆