Java数组,查找重复 [英] Java Array, Finding Duplicates

查看:146
本文介绍了Java数组,查找重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数组,并正在寻找重复。

 重复= FALSE;
为(J = 0; J<拉链codeList.length; J ++){
    对于(K = 0; K<拉链codeList.length; k ++){
        如果(ZIP codeLIST [K] ==拉链codeLIST [J]){
            重复= TRUE;
        }
    }
}

不过,这code不工作时,有没有重复。个为什么吗?


解决方案

在鼻子答案..

 重复= FALSE;
为(J = 0; J<拉链codeList.length; J ++)
  对于(K = J + 1; K<拉链codeList.length; k ++)
    如果(K = J&安培;!&安培;邮编codeLIST [K] ==拉链codeLIST [J]。)
      重复= TRUE;

编辑切换 .equals() == ,因为我读的地方你使用 INT ,这是不是在最初的问题清楚。还设置 K = J + 1 ,减半执行时间,但它仍然为O(n 2 )。

更快(在极限)的方式

下面是一个哈希为基础的方法。你必须对自动装​​箱的薪酬,但它是为O(n),而不是为O(n 2 )。一个积极进取的灵魂会去找到一个原始的基于INT-哈希集合(Apache或谷歌集合有这样的事情,记错。)

 布尔重复(最终诠释[]拉链codeLIST)
{
  SET<整数GT;肿块=新的HashSet<整数GT;();
  对于(INT I:拉链codeLIST)
  {
    如果(lump.contains(I)),则返回true;
    lump.add(ⅰ);
  }
  返回false;
}

低头HuyLe

请参阅HuyLe's回答以或多或少O(n)的解决方案,我认为需要一对夫妇的用户互动,增加步骤:

 静态布尔重复(最终诠释[]拉链codeLIST)
{
   最终诠释MAXZIP = 99999;
   布尔[] =位图新的布尔[MAXZIP + 1];
   java.util.Arrays.fill(位图,FALSE);
   对于(INT项目:拉链codeLIST)
     (!位图[项目])如果位[项目] =真;
     否则返回true;
   }
   返回false;
}

或只是紧凑

 静态布尔重复(最终诠释[]拉链codeLIST)
{
   最终诠释MAXZIP = 99999;
   布尔[] =位图新的布尔[MAXZIP + 1]; // Java的保证初始化为假
   对于(INT项目:拉链codeLIST)
     如果(!(位图[项目] ^ = TRUE))返回true;
   返回false;
}

有什么关系?

嘛,所以我跑了一个小的基准,这是前途未卜所有的地方,但这里的code:

 进口java.util.BitSet中;类育
{
  静态布尔duplicatesZero(最终诠释[]拉链codeLIST)
  {
    布尔重复= FALSE;
    对于(INT J = 0; J<拉链codelist.length; J ++)
      对于(INT K = J + 1; K<拉链codelist.length; k ++)
        如果(K = J&安培;!&安培;邮编codeLIST [K] ==拉链codeLIST [J]。)
          重复= TRUE;    返回副本;
  }
  静态布尔duplicatesOne(最终诠释[]拉链codeLIST)
  {
    最终诠释MAXZIP = 99999;
    布尔[] =位图新的布尔[MAXZIP + 1];
    java.util.Arrays.fill(位图,FALSE);
    对于(INT项目:拉链codeLIST){
      如果(!(位图[项目] ^ = TRUE))
        返回true;
    }
    返回false;
  }  静态布尔duplicatesTwo(最终诠释[]拉链codeLIST)
  {
    最终诠释MAXZIP = 99999;    位集合B =新的BitSet(MAXZIP + 1);
    b.set(0,MAXZIP,FALSE);
    对于(INT项目:拉链codeLIST){
      如果(!b.get(项目)){
        b.set(项目,真正的);
      }其他
        返回true;
    }
    返回false;
  }  枚举ApproachT {NSQUARED,HashSet的,BITSET};  / **
   * @参数ARGS
   * /
  公共静态无效的主要(字串[] args)
  {
    ApproachT方法= ApproachT.BITSET;    最终诠释REPS = 100;
    最终诠释MAXZIP = 99999;    INT [] =大小INT新[] {10,1000,10000,100000,百万};
    长[] []次=新长[sizes.length] [REPS];    布尔tossme = FALSE;    对(INT sizei = 0; sizei&下; sizes.length; sizei ++){
      通信System.err.println(试行对于ZIP codeLIST大小=+尺寸[sizei]);
      对于(INT代表= 0;代表<名代表;代表++){
        INT []拉链codeLIST =新的INT [大小[sizei]];
        的for(int i = 0; I<拉链codelist.length;我++){
          拉链codeLIST [I] =(int)的(的Math.random()*(MAXZIP + 1));
        }
        长开始= System.currentTimeMillis的();
        开关(方法){
        案例NSQUARED:
          tossme ^ =(duplicatesZero(邮政编码codeLIST));
          打破;
        案例HashSet的:
          tossme ^ =(duplicatesOne(邮政编码codeLIST));
          打破;
        案例BITSET:
          tossme ^ =(duplicatesTwo(邮政编码codeLIST));
          打破;        }
        长端= System.currentTimeMillis的();
        次[sizei] [代表] =结束 - 开始;
      }
      长期平均= 0;
      对于(INT代表= 0;代表<名代表;代表++){
        平均+ =倍[sizei] [代表];
      }
      通信System.err.println(大小=+尺寸[sizei] +,平均时间=
            +平均/(双)REPS +毫秒);
    }
  }}

使用NSQUARED:

 审判的大小= 10
大小= 10,平均时间为0.0ms
审判大小= 1000
大小= 1000,平均时间为0.0ms
审判大小= 10000
大小= 10000,平均时间为100.0ms
审判大小= 100000
大小= 100000,平均时间为9923.3ms

使用的HashSet

 试用对于ZIP codeLIST大小= 10
大小= 10,平均时间= 0.16ms
试用对于ZIP codeLIST大小= 1000
大小= 1000,平均时间= 0.15ms
试用对于ZIP codeLIST大小= 10000
大小= 10000,平均时间为0.0ms
试用对于ZIP codeLIST大小= 100000
大小= 100000,平均时间= 0.16ms
试用对于ZIP codeLIST大小= 1000000
大小= 1000000,平均时间为0.0ms

使用的BitSet

 试用对于ZIP codeLIST大小= 10
大小= 10,平均时间为0.0ms
试用对于ZIP codeLIST大小= 1000
大小= 1000,平均时间为0.0ms
试用对于ZIP codeLIST大小= 10000
大小= 10000,平均时间为0.0ms
试用对于ZIP codeLIST大小= 100000
大小= 100000,平均时间为0.0ms
试用对于ZIP codeLIST大小= 1000000
大小= 1000000,平均时间为0.0ms

BITSET获胜!

但是,只有通过发... .15ms是误差的currentTimeMillis()之内,并且有在我的基准一些大洞。注意,对于任何列表超过100000更长的时间,你可以简单地返回真正,因为会有重复。事实上,如果列表就像是随机的东西,你可以为一个更短的清单返回true WHP。什么是道德?在极限情况下,最有效的实现是:

 返回true;

你不会是错的很频繁。

I have an array, and am looking for duplicates.

duplicates = false;
for(j = 0; j < zipcodeList.length; j++){
    for(k = 0; k < zipcodeList.length; k++){
        if (zipcodeList[k] == zipcodeList[j]){
            duplicates = true;
        }
    }
}

However, this code doesnt work when there are no duplicates. Whys that?

解决方案

On the nose answer..

duplicates=false;
for (j=0;j<zipcodeList.length;j++)
  for (k=j+1;k<zipcodeList.length;k++)
    if (k!=j && zipcodeList[k] == zipcodeList[j])
      duplicates=true;

Edited to switch .equals() back to == since I read somewhere you're using int, which wasn't clear in the initial question. Also to set k=j+1, to halve execution time, but it's still O(n2).

A faster (in the limit) way

Here's a hash based approach. You gotta pay for the autoboxing, but it's O(n) instead of O(n2). An enterprising soul would go find a primitive int-based hash set (Apache or Google Collections has such a thing, methinks.)

boolean duplicates(final int[] zipcodelist)
{
  Set<Integer> lump = new HashSet<Integer>();
  for (int i : zipcodelist)
  {
    if (lump.contains(i)) return true;
    lump.add(i);
  }
  return false;
}

Bow to HuyLe

See HuyLe's answer for a more or less O(n) solution, which I think needs a couple of add'l steps:

static boolean duplicates(final int[] zipcodelist)
{
   final int MAXZIP = 99999;
   boolean[] bitmap = new boolean[MAXZIP+1];
   java.util.Arrays.fill(bitmap, false);
   for (int item : zipcodeList)
     if (!bitmap[item]) bitmap[item] = true;
     else return true;
   }
   return false;
}

Or Just to be Compact

static boolean duplicates(final int[] zipcodelist)
{
   final int MAXZIP = 99999;
   boolean[] bitmap = new boolean[MAXZIP+1];  // Java guarantees init to false
   for (int item : zipcodeList)
     if (!(bitmap[item] ^= true)) return true;
   return false;
}

Does it Matter?

Well, so I ran a little benchmark, which is iffy all over the place, but here's the code:

import java.util.BitSet;

class Yuk
{
  static boolean duplicatesZero(final int[] zipcodelist)
  {
    boolean duplicates=false;
    for (int j=0;j<zipcodelist.length;j++)
      for (int k=j+1;k<zipcodelist.length;k++)
        if (k!=j && zipcodelist[k] == zipcodelist[j])
          duplicates=true;

    return duplicates;
  }


  static boolean duplicatesOne(final int[] zipcodelist)
  {
    final int MAXZIP = 99999;
    boolean[] bitmap = new boolean[MAXZIP + 1];
    java.util.Arrays.fill(bitmap, false);
    for (int item : zipcodelist) {
      if (!(bitmap[item] ^= true))
        return true;
    }
    return false;
  }

  static boolean duplicatesTwo(final int[] zipcodelist)
  {
    final int MAXZIP = 99999;

    BitSet b = new BitSet(MAXZIP + 1);
    b.set(0, MAXZIP, false);
    for (int item : zipcodelist) {
      if (!b.get(item)) {
        b.set(item, true);
      } else
        return true;
    }
    return false;
  }

  enum ApproachT { NSQUARED, HASHSET, BITSET};

  /**
   * @param args
   */
  public static void main(String[] args)
  {
    ApproachT approach = ApproachT.BITSET;

    final int REPS = 100;
    final int MAXZIP = 99999;

    int[] sizes = new int[] { 10, 1000, 10000, 100000, 1000000 };
    long[][] times = new long[sizes.length][REPS];

    boolean tossme = false;

    for (int sizei = 0; sizei < sizes.length; sizei++) {
      System.err.println("Trial for zipcodelist size= "+sizes[sizei]);
      for (int rep = 0; rep < REPS; rep++) {
        int[] zipcodelist = new int[sizes[sizei]];
        for (int i = 0; i < zipcodelist.length; i++) {
          zipcodelist[i] = (int) (Math.random() * (MAXZIP + 1));
        }
        long begin = System.currentTimeMillis();
        switch (approach) {
        case NSQUARED :
          tossme ^= (duplicatesZero(zipcodelist));
          break;
        case HASHSET :
          tossme ^= (duplicatesOne(zipcodelist));
          break;
        case BITSET :
          tossme ^= (duplicatesTwo(zipcodelist));
          break;

        }
        long end = System.currentTimeMillis();
        times[sizei][rep] = end - begin;


      }
      long avg = 0;
      for (int rep = 0; rep < REPS; rep++) {
        avg += times[sizei][rep];
      }
      System.err.println("Size=" + sizes[sizei] + ", avg time = "
            + avg / (double)REPS + "ms");
    }
  }

}

With NSQUARED:

Trial for size= 10
Size=10, avg time = 0.0ms
Trial for size= 1000
Size=1000, avg time = 0.0ms
Trial for size= 10000
Size=10000, avg time = 100.0ms
Trial for size= 100000
Size=100000, avg time = 9923.3ms

With HashSet

Trial for zipcodelist size= 10
Size=10, avg time = 0.16ms
Trial for zipcodelist size= 1000
Size=1000, avg time = 0.15ms
Trial for zipcodelist size= 10000
Size=10000, avg time = 0.0ms
Trial for zipcodelist size= 100000
Size=100000, avg time = 0.16ms
Trial for zipcodelist size= 1000000
Size=1000000, avg time = 0.0ms

With BitSet

Trial for zipcodelist size= 10
Size=10, avg time = 0.0ms
Trial for zipcodelist size= 1000
Size=1000, avg time = 0.0ms
Trial for zipcodelist size= 10000
Size=10000, avg time = 0.0ms
Trial for zipcodelist size= 100000
Size=100000, avg time = 0.0ms
Trial for zipcodelist size= 1000000
Size=1000000, avg time = 0.0ms

BITSET Wins!

But only by a hair... .15ms is within the error for currentTimeMillis(), and there are some gaping holes in my benchmark. Note that for any list longer than 100000, you can simply return true because there will be a duplicate. In fact, if the list is anything like random, you can return true WHP for a much shorter list. What's the moral? In the limit, the most efficient implementation is:

 return true;

And you won't be wrong very often.

这篇关于Java数组,查找重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆