从一个大的使用整型数组的Java删除重复 [英] Remove duplicates from a large integer array using Java

查看:216
本文介绍了从一个大的使用整型数组的Java删除重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你知道的任何时候有效的方式,从使用Java一个非常大的整数数组删除重复值?阵列的大小取决于在用户的记录,但将总是超过150万的一些重复的未分类的值。每个整数包含100000和9999999之间的数值。

Do you know of any time efficient way to remove duplicated values from a very big integer array using Java? The size of the array depends on the logged in user, but will always exceed 1500000 unsorted values with some duplicates. Every integer contains a number between 100000 and 9999999.

我试图将其转换为一个列表,但我的服务器上的堆不允许这个数据量(我的ISP限制了它)。而对于环内循环定期需要5分钟来计算。

I tried converting it to a List, but the heap on my server doesn't allow this amount of data(my ISP has restricted it). And a regular for loop within a for loop takes over 5 minutes to calculate.

数组的不重复的大小是一个我将在我的数据库存储。

The size of the array without the duplicates is the one I will store in my database.

帮助将AP preciated!

Help would be appreciated!

推荐答案

您也许可以使用一点设置?我不知道Java的BitSet中是多么有效。但是9999999可能的值将只需要8分之9999999= 1250000字节=刚刚超过1MB。当你漫步值的数组,设置相应的位设置为true。然后,您可以随时找了一下设置为true在位为输出走相应的值。

You could perhaps use a bit set? I don't know how efficient Java's BitSet is. But 9999999 possible values would only take 9999999/8 = 1250000 bytes = just over 1Mb. As you walk the array of values, set the corresponding bit to true. Then you can walk over the bit set and output the corresponding value whenever you find a bit set to true.

为1Mb将适合在一个CPU的缓存,所以这可能取决于位设置的实施是相当有效的。

1Mb will fit in a CPU cache, so this could be quite efficient depending on the bit set implementation.

此也有太多排序的数据的副作用

This also has the side-effect of sorting the data too.

和...,因为它需要在输入数据单传,这是一个O(n)的算法,这组操作是O(1)(用于基于阵列的设置是这样),输出通行证还O(m),其中m是唯一值的数量,并且根据定义,必须是下; = N

And... this is an O(n) algorithm since it requires a single pass over the input data, the set operations are O(1) (for an array-based set like this), and the output pass is also O(m) where m is the number of unique values and, by definition, must be <= n.

这篇关于从一个大的使用整型数组的Java删除重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆