算法删除阵列中没有辅助存储的重复元素 [英] algorithm removing duplicate elements in array without auxillay storage

查看:141
本文介绍了算法删除阵列中没有辅助存储的重复元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在致力于在数组中删除​​重复元素的这个着名的面试问题,而不使用辅助存储并保留订单;

I am working on this famous interview question on removing duplicate elements in array without using auxillary storage and preserving the order;

我已经阅读了一堆帖子; 算法:从数组中删除重复整数的有效方法使用C 。

I have read a bunch of posts; Algorithm: efficient way to remove duplicate integers from an array, Removing Duplicates from an Array using C.

它们是在 C (不解释)或 Java代码提供的仅在连续重复的情况下失败,例如 [1,1,1,3,3]

They are either implemented in C (without explanation) or the Java Code provided just fails when there is consecutive duplicates such as [1,1,1,3,3].

我不太自信地使用 C ,我的背景是 Java 。所以我自己实现了代码
如下所示:

I am not quite confident with using C, my background is Java. So I implemented the code myself; it follows like this:


  1. 使用两个循环,外部循环遍历数组,内部循环检查重复项,如果现在将其替换为null。

  2. 然后我再次执行duplicate-replace-null数组,并删除空元素并将其替换为下一个非空元素。

  3. 我现在看到的总运行时间是 O(n ^ 2)+ O(n)〜O(n ^ 2)。阅读上述帖子,我明白这是我们能做的最好的,如果不允许排序和辅助存储。
    我的代码在这里:我正在寻找方法来优化任何进一步的(如果有可能)或更好/简单的逻辑;

  1. use two loops, the outer-loop traverses the array and inner loop checks for duplicates and if present replace it with null.
  2. Then I go over the duplicate-replaced-null array and remove null elements and replacing it with the next non-null element.
  3. The total run-time I see now is O(n^2)+O(n) ~ O(n^2). Reading the above posts, I understood this is the best we can do, if no sorting and auxiliary storage is allowed. My code is here: I am looking for ways to optimize any further (if there is a possibility) or a better/simplisitc logic;

public class RemoveDup {
    public static void main (String[] args){
        Integer[]  arr2={3,45,1,2,3,3,3,3,2,1,45,2,10};
            Integer[] res= removeDup(arr2);
                System.out.println(Arrays.toString(res));
            }
          private static Integer[] removeDup(Integer[] data) {
            int size = data.length;
            int count = 1;
                for (int i = 0; i < size; i++) {
                    Integer temp = data[i];
                    for (int j = i + 1; j < size && temp != null; j++) {
                        if (data[j] == temp) {
                            data[j] = null;
                        }
                    }
                }
                for (int i = 1; i < size; i++) {
                    Integer current = data[i];
                    if (data[i] != null) {
                        data[count++] = current;
                    }
                }

                return Arrays.copyOf(data, count);

         }

}

编辑1;来自@keshlam的重新格式化的代码抛出ArrayIndexOutofBound异常:

EDIT 1; Reformatted code from @keshlam throws ArrayIndexOutofBound Exception:

private static int removeDupes(int[] array) {
        System.out.println("method called");
        if(array.length < 2)
          return array.length;

        int outsize=1; // first is always kept

     for (int consider = 1; consider < array.length; ++consider) {

          for(int compare=0;compare<outsize;++compare) {
            if(array[consider]!=array[compare])
                array[outsize++]=array[consider]; // already present; advance to next compare
           else break;
          // if we get here, we know it's new so append it to output
          //array[outsize++]=array[consider]; // could test first, not worth it. 

        }

      }
        System.out.println(Arrays.toString(array));
         // length is last written position plus 1
        return outsize;
    }


推荐答案

这应该是O(N * N)最坏的情况。 (由于更小的常数,因为我测试N平均为1/2 N,但这是计算机科学而不是软件工程,只有2X的加速并不重要,感谢@Alexandru ,指出这一点。)

OK, here's my answer, which should be O(N*N) worst case. (With smaller constant, since even worstcase I'm testing N against -- on average -- 1/2 N, but this is computer science rather than software engineering and a mere 2X speedup isn't significant. Thanks to @Alexandru for pointing that out.)

1)拆分光标(输入和输出分别提前),

1) Split cursor (input and output advanced separately),

2)每个新值只能与已经保留的值进行比较,如果找到匹配,比较可能会停止。 (提示关键字为增量)

2) Each new value only has to be compared to what's already been kept, and compare can stop if a match is found. (The hint keyword was "incremental")

3)第一个元素不需要测试。

3) First element need not be tested.

4)我正在利用标签为继续的优势,我可以在 break ing之前设置一个标志,然后测试标志。出来同样的事情;这是一个比较优雅的。

4) I'm taking advantage of labelled continue where I could have instead set a flag before breaking and then tested the flag. Comes out to the same thing; this is a bit more elegant.

4.5)我可以测试,如果这是真的,是否考虑,而不是复制。但测试它将需要与执行可能不必要的副本一样多的周期,并且大多数情况下,他们将不会相同,所以只需让可能的冗余副本变得更容易

4.5) I could have tested whether outsize==consider and not copied if that was true. But testing for it would take about as many cycles as doing the possibly-unnecessary copy, and the majority case is that they will not be the same, so it's easier to just let a possibly redundant copy take place.

5)我没有重新键入关键功能中的数据;我已经将打印副本操​​作分解为一个单独的功能,以清楚地表明 removeDupes 完全在目标数组中运行,加上堆栈上的几个自动变量。而且我没有花时间在数组末尾清除剩下的元素;这可能是浪费的工作(如在这种情况下)。虽然我不认为它实际上会改变正式的复杂性。

5) I'm not recopying the data in the key function; I've factored out the copy-for-printing operation to a separate function to make clear that removeDupes does run entirely in the target array plus a few automatic variables on the stack. And I'm not spending time zeroing out the leftover elements at the end of the array; that may be wasted work (as in this case). Though I don't think it would actually change the formal complexity.

import java.util.Arrays;

public class RemoveDupes {

  private static int removeDupes(final int[] array) {
    if(array.length < 2)
      return array.length;

    int outsize=1; // first is always kept

    outerloop: for (int consider = 1; consider < array.length; ++consider) {

      for(int compare=0;compare<outsize;++compare)
        if(array[consider]==array[compare])
          continue outerloop; // already present; advance to next compare

      // if we get here, we know it's new so append it to output
      array[outsize++]=array[consider]; // could test first, not worth it. 
    }

    return outsize; // length is last written position plus 1
  }

  private static void printRemoveDupes(int[] array) {
    int newlength=removeDupes(array);
    System.out.println(Arrays.toString(Arrays.copyOfRange(array, 0, newlength)));
  }

  public static void main(final String[] args) {
    printRemoveDupes(new int[] { 3, 45, 1, 2, 3, 3, 3, 3, 2, 1, 45, 2, 10 });
    printRemoveDupes(new int[] { 2, 2, 3, 3 });
    printRemoveDupes(new int[] { 1, 1, 1, 1, 1, 1, 1, 1 });
  }
}

LATE ADDITION:人们对我的解释中的第4点表示混淆,这里是重写的循环,没有标记为 continue

LATE ADDITION: Since folks expressed confusion about point 4 in my explanation, here's the loop rewritten without labelled continue:

for (int consider = 1; consider < array.length; ++consider) {
  boolean matchfound=false;

  for(int compare=0;compare<outsize;++compare) {
    if(array[consider]==array[compare]) {
      matchfound=true;
      break;
    }

    if(!matchFound) // only add it to the output if not found
      array[outsize++]=array[consider];
}

希望有所帮助。标记为 continue 是Java的一个很少使用的功能,所以有些人以前没有看到过,这并不奇怪。这是有用的,但它确实使代码更难阅读;我可能不会使用它比这个简单的算法复杂得多。

Hope that helps. Labelled continue is a rarely-used feature of Java, so it isn't too surprising that some folks haven't seen it before. It's useful, but it does make code harder to read; I probably wouldn't use it in anything much more complicated than this simple algorithm.

这篇关于算法删除阵列中没有辅助存储的重复元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆