Java数组,查找重复 [英] Java Array, Finding Duplicates
问题描述
我有一个数组,并正在寻找重复。
重复= FALSE;
为(J = 0; J<拉链codeList.length; J ++){
对于(K = 0; K<拉链codeList.length; k ++){
如果(ZIP codeLIST [K] ==拉链codeLIST [J]){
重复= TRUE;
}
}
}
不过,这code不工作时,有没有重复。个为什么吗?
在鼻子答案..
重复= FALSE;
为(J = 0; J<拉链codeList.length; J ++)
对于(K = J + 1; K<拉链codeList.length; k ++)
如果(K = J&安培;!&安培;邮编codeLIST [K] ==拉链codeLIST [J]。)
重复= TRUE;
编辑切换 .equals()
回 ==
,因为我读的地方你使用 INT
,这是不是在最初的问题清楚。还设置 K = J + 1
,减半执行时间,但它仍然为O(n 2 )。
更快(在极限)的方式
下面是一个哈希为基础的方法。你必须对自动装箱的薪酬,但它是为O(n),而不是为O(n 2 )。一个积极进取的灵魂会去找到一个原始的基于INT-哈希集合(Apache或谷歌集合有这样的事情,记错。)
布尔重复(最终诠释[]拉链codeLIST)
{
SET<整数GT;肿块=新的HashSet<整数GT;();
对于(INT I:拉链codeLIST)
{
如果(lump.contains(I)),则返回true;
lump.add(ⅰ);
}
返回false;
}
低头HuyLe
请参阅HuyLe's回答以或多或少O(n)的解决方案,我认为需要一对夫妇的用户互动,增加步骤:
静态布尔重复(最终诠释[]拉链codeLIST)
{
最终诠释MAXZIP = 99999;
布尔[] =位图新的布尔[MAXZIP + 1];
java.util.Arrays.fill(位图,FALSE);
对于(INT项目:拉链codeLIST)
(!位图[项目])如果位[项目] =真;
否则返回true;
}
返回false;
}
或只是紧凑
静态布尔重复(最终诠释[]拉链codeLIST)
{
最终诠释MAXZIP = 99999;
布尔[] =位图新的布尔[MAXZIP + 1]; // Java的保证初始化为假
对于(INT项目:拉链codeLIST)
如果(!(位图[项目] ^ = TRUE))返回true;
返回false;
}
有什么关系?
嘛,所以我跑了一个小的基准,这是前途未卜所有的地方,但这里的code:
进口java.util.BitSet中;类育
{
静态布尔duplicatesZero(最终诠释[]拉链codeLIST)
{
布尔重复= FALSE;
对于(INT J = 0; J<拉链codelist.length; J ++)
对于(INT K = J + 1; K<拉链codelist.length; k ++)
如果(K = J&安培;!&安培;邮编codeLIST [K] ==拉链codeLIST [J]。)
重复= TRUE; 返回副本;
}
静态布尔duplicatesOne(最终诠释[]拉链codeLIST)
{
最终诠释MAXZIP = 99999;
布尔[] =位图新的布尔[MAXZIP + 1];
java.util.Arrays.fill(位图,FALSE);
对于(INT项目:拉链codeLIST){
如果(!(位图[项目] ^ = TRUE))
返回true;
}
返回false;
} 静态布尔duplicatesTwo(最终诠释[]拉链codeLIST)
{
最终诠释MAXZIP = 99999; 位集合B =新的BitSet(MAXZIP + 1);
b.set(0,MAXZIP,FALSE);
对于(INT项目:拉链codeLIST){
如果(!b.get(项目)){
b.set(项目,真正的);
}其他
返回true;
}
返回false;
} 枚举ApproachT {NSQUARED,HashSet的,BITSET}; / **
* @参数ARGS
* /
公共静态无效的主要(字串[] args)
{
ApproachT方法= ApproachT.BITSET; 最终诠释REPS = 100;
最终诠释MAXZIP = 99999; INT [] =大小INT新[] {10,1000,10000,100000,百万};
长[] []次=新长[sizes.length] [REPS]; 布尔tossme = FALSE; 对(INT sizei = 0; sizei&下; sizes.length; sizei ++){
通信System.err.println(试行对于ZIP codeLIST大小=+尺寸[sizei]);
对于(INT代表= 0;代表<名代表;代表++){
INT []拉链codeLIST =新的INT [大小[sizei]];
的for(int i = 0; I<拉链codelist.length;我++){
拉链codeLIST [I] =(int)的(的Math.random()*(MAXZIP + 1));
}
长开始= System.currentTimeMillis的();
开关(方法){
案例NSQUARED:
tossme ^ =(duplicatesZero(邮政编码codeLIST));
打破;
案例HashSet的:
tossme ^ =(duplicatesOne(邮政编码codeLIST));
打破;
案例BITSET:
tossme ^ =(duplicatesTwo(邮政编码codeLIST));
打破; }
长端= System.currentTimeMillis的();
次[sizei] [代表] =结束 - 开始;
}
长期平均= 0;
对于(INT代表= 0;代表<名代表;代表++){
平均+ =倍[sizei] [代表];
}
通信System.err.println(大小=+尺寸[sizei] +,平均时间=
+平均/(双)REPS +毫秒);
}
}}
使用NSQUARED:
审判的大小= 10
大小= 10,平均时间为0.0ms
审判大小= 1000
大小= 1000,平均时间为0.0ms
审判大小= 10000
大小= 10000,平均时间为100.0ms
审判大小= 100000
大小= 100000,平均时间为9923.3ms
使用的HashSet
试用对于ZIP codeLIST大小= 10
大小= 10,平均时间= 0.16ms
试用对于ZIP codeLIST大小= 1000
大小= 1000,平均时间= 0.15ms
试用对于ZIP codeLIST大小= 10000
大小= 10000,平均时间为0.0ms
试用对于ZIP codeLIST大小= 100000
大小= 100000,平均时间= 0.16ms
试用对于ZIP codeLIST大小= 1000000
大小= 1000000,平均时间为0.0ms
使用的BitSet
试用对于ZIP codeLIST大小= 10
大小= 10,平均时间为0.0ms
试用对于ZIP codeLIST大小= 1000
大小= 1000,平均时间为0.0ms
试用对于ZIP codeLIST大小= 10000
大小= 10000,平均时间为0.0ms
试用对于ZIP codeLIST大小= 100000
大小= 100000,平均时间为0.0ms
试用对于ZIP codeLIST大小= 1000000
大小= 1000000,平均时间为0.0ms
BITSET获胜!
但是,只有通过发... .15ms是误差的currentTimeMillis()
之内,并且有在我的基准一些大洞。注意,对于任何列表超过100000更长的时间,你可以简单地返回真正
,因为会有重复。事实上,如果列表就像是随机的东西,你可以为一个更短的清单返回true WHP。什么是道德?在极限情况下,最有效的实现是:
返回true;
你不会是错的很频繁。
I have an array, and am looking for duplicates.
duplicates = false;
for(j = 0; j < zipcodeList.length; j++){
for(k = 0; k < zipcodeList.length; k++){
if (zipcodeList[k] == zipcodeList[j]){
duplicates = true;
}
}
}
However, this code doesnt work when there are no duplicates. Whys that?
On the nose answer..
duplicates=false;
for (j=0;j<zipcodeList.length;j++)
for (k=j+1;k<zipcodeList.length;k++)
if (k!=j && zipcodeList[k] == zipcodeList[j])
duplicates=true;
Edited to switch .equals()
back to ==
since I read somewhere you're using int
, which wasn't clear in the initial question. Also to set k=j+1
, to halve execution time, but it's still O(n2).
A faster (in the limit) way
Here's a hash based approach. You gotta pay for the autoboxing, but it's O(n) instead of O(n2). An enterprising soul would go find a primitive int-based hash set (Apache or Google Collections has such a thing, methinks.)
boolean duplicates(final int[] zipcodelist)
{
Set<Integer> lump = new HashSet<Integer>();
for (int i : zipcodelist)
{
if (lump.contains(i)) return true;
lump.add(i);
}
return false;
}
Bow to HuyLe
See HuyLe's answer for a more or less O(n) solution, which I think needs a couple of add'l steps:
static boolean duplicates(final int[] zipcodelist)
{
final int MAXZIP = 99999;
boolean[] bitmap = new boolean[MAXZIP+1];
java.util.Arrays.fill(bitmap, false);
for (int item : zipcodeList)
if (!bitmap[item]) bitmap[item] = true;
else return true;
}
return false;
}
Or Just to be Compact
static boolean duplicates(final int[] zipcodelist)
{
final int MAXZIP = 99999;
boolean[] bitmap = new boolean[MAXZIP+1]; // Java guarantees init to false
for (int item : zipcodeList)
if (!(bitmap[item] ^= true)) return true;
return false;
}
Does it Matter?
Well, so I ran a little benchmark, which is iffy all over the place, but here's the code:
import java.util.BitSet;
class Yuk
{
static boolean duplicatesZero(final int[] zipcodelist)
{
boolean duplicates=false;
for (int j=0;j<zipcodelist.length;j++)
for (int k=j+1;k<zipcodelist.length;k++)
if (k!=j && zipcodelist[k] == zipcodelist[j])
duplicates=true;
return duplicates;
}
static boolean duplicatesOne(final int[] zipcodelist)
{
final int MAXZIP = 99999;
boolean[] bitmap = new boolean[MAXZIP + 1];
java.util.Arrays.fill(bitmap, false);
for (int item : zipcodelist) {
if (!(bitmap[item] ^= true))
return true;
}
return false;
}
static boolean duplicatesTwo(final int[] zipcodelist)
{
final int MAXZIP = 99999;
BitSet b = new BitSet(MAXZIP + 1);
b.set(0, MAXZIP, false);
for (int item : zipcodelist) {
if (!b.get(item)) {
b.set(item, true);
} else
return true;
}
return false;
}
enum ApproachT { NSQUARED, HASHSET, BITSET};
/**
* @param args
*/
public static void main(String[] args)
{
ApproachT approach = ApproachT.BITSET;
final int REPS = 100;
final int MAXZIP = 99999;
int[] sizes = new int[] { 10, 1000, 10000, 100000, 1000000 };
long[][] times = new long[sizes.length][REPS];
boolean tossme = false;
for (int sizei = 0; sizei < sizes.length; sizei++) {
System.err.println("Trial for zipcodelist size= "+sizes[sizei]);
for (int rep = 0; rep < REPS; rep++) {
int[] zipcodelist = new int[sizes[sizei]];
for (int i = 0; i < zipcodelist.length; i++) {
zipcodelist[i] = (int) (Math.random() * (MAXZIP + 1));
}
long begin = System.currentTimeMillis();
switch (approach) {
case NSQUARED :
tossme ^= (duplicatesZero(zipcodelist));
break;
case HASHSET :
tossme ^= (duplicatesOne(zipcodelist));
break;
case BITSET :
tossme ^= (duplicatesTwo(zipcodelist));
break;
}
long end = System.currentTimeMillis();
times[sizei][rep] = end - begin;
}
long avg = 0;
for (int rep = 0; rep < REPS; rep++) {
avg += times[sizei][rep];
}
System.err.println("Size=" + sizes[sizei] + ", avg time = "
+ avg / (double)REPS + "ms");
}
}
}
With NSQUARED:
Trial for size= 10
Size=10, avg time = 0.0ms
Trial for size= 1000
Size=1000, avg time = 0.0ms
Trial for size= 10000
Size=10000, avg time = 100.0ms
Trial for size= 100000
Size=100000, avg time = 9923.3ms
With HashSet
Trial for zipcodelist size= 10
Size=10, avg time = 0.16ms
Trial for zipcodelist size= 1000
Size=1000, avg time = 0.15ms
Trial for zipcodelist size= 10000
Size=10000, avg time = 0.0ms
Trial for zipcodelist size= 100000
Size=100000, avg time = 0.16ms
Trial for zipcodelist size= 1000000
Size=1000000, avg time = 0.0ms
With BitSet
Trial for zipcodelist size= 10
Size=10, avg time = 0.0ms
Trial for zipcodelist size= 1000
Size=1000, avg time = 0.0ms
Trial for zipcodelist size= 10000
Size=10000, avg time = 0.0ms
Trial for zipcodelist size= 100000
Size=100000, avg time = 0.0ms
Trial for zipcodelist size= 1000000
Size=1000000, avg time = 0.0ms
BITSET Wins!
But only by a hair... .15ms is within the error for currentTimeMillis()
, and there are some gaping holes in my benchmark. Note that for any list longer than 100000, you can simply return true
because there will be a duplicate. In fact, if the list is anything like random, you can return true WHP for a much shorter list. What's the moral? In the limit, the most efficient implementation is:
return true;
And you won't be wrong very often.
这篇关于Java数组,查找重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!