面试题:从未排序的链表中删除重复项 [英] Interview question: remove duplicates from an unsorted linked list

查看:19
本文介绍了面试题:从未排序的链表中删除重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读 破解编码面试,第四版:150 个编程面试问题和解决方案,我正在尝试解决以下问题:

I'm reading Cracking the Coding Interview, Fourth Edition: 150 Programming Interview Questions and Solutions and I'm trying to solve the following question:

2.1 编写代码从未排序的链表中删除重复项.跟随UP:如果遇到这个问题,你会怎么解决?不允许使用临时缓冲区?

2.1 Write code to remove duplicates from an unsorted linked list. FOLLOW UP: How would you solve this problem if a temporary buffer is not allowed?

我正在用 C# 解决它,所以我创建了自己的 Node 类:

I'm solving it in C#, so I made my own Node class:

public class Node<T> where T : class
{
    public Node<T> Next { get; set; }
    public T Value { get; set; }

    public Node(T value)
    {
        Next = null;
        Value = value;
    }
}

我的解决方案是遍历列表,然后对每个节点遍历列表的其余部分并删除任何重复项(请注意,我并没有按照书中的说明实际编译或测试它):

My solution is to iterate through the list, then for each node to iterated through the remainder of the list and remove any duplicates (note that I haven't actually compiled or tested this, as instructed by the book):

public void RemoveDuplicates(Node<T> head)
{
    // Iterate through the list
    Node<T> iter = head;
    while(iter != null)
    {
        // Iterate to the remaining nodes in the list
        Node<T> current = iter;
        while(current!= null && current.Next != null)
        {
            if(iter.Value == current.Next.Value)
            {
                current.Next = current.Next.Next;
            }

            current = current.Next;
        }    

        iter = iter.Next;
    }
}

这里是书中的解决方案(作者用java写的):

Here is the solution from the book (the author wrote it in java):

没有缓冲区,我们可以迭代两个指针:当前"是否正常迭代,而runner"迭代通过所有先前的节点来检查重复.跑步者每人只会看到一个 dup节点,因为如果有多个他们本来是重复的已经删除了.

Without a buffer, we can iterate with two pointers: "current" does a normal iteration, while "runner" iterates through all prior nodes to check for dups. Runner will only see one dup per node, because if there were multiple duplicates they would have been removed already.

public static void deleteDups2(LinkedListNode head) 
{
    if (head == null) return;

    LinkedListNode previous = head;
    LinkedListNode current = previous.next;

    while (current != null) 
    {
        LinkedListNode runner = head;

        while (runner != current) { // Check for earlier dups
            if (runner.data == current.data) 
            {
                LinkedListNode tmp = current.next; // remove current
                previous.next = tmp;
                current = tmp; // update current to next node
                break; // all other dups have already been removed
            }
            runner = runner.next;
        }
        if (runner == current) { // current not updated - update now
            previous = current;
            current = current.next;
        }
    }
}

所以我的解决方案总是寻找当前节点到最后的重复项,而他们的解决方案则寻找从头到当前节点的重复项.我觉得这两种解决方案都会遇到性能问题,具体取决于列表中有多少重复项以及它们的分布方式(密度和位置).但总的来说:我的答案是几乎和书中的答案一样好,还是明显更差?

So my solution always looks for duplicates for the current node to the end, while their solution looks for duplicates from the head to the current node. I feel like both solutions would suffer performance issues depending on how many duplicates there are in the list and how they're distributed (density and position). But in general: is my answer nearly as good as the one in the book or is it significantly worse?

推荐答案

如果你给一个人一条鱼,他们可以吃一天.如果你教一个人钓鱼......

If you give a person a fish, they eat for a day. If you teach a person to fish...

我对实施质量的衡量标准是:

My measures for the quality of an implementation are:

  • 正确性:如果您在所有情况下都没有得到正确的答案,则说明还没有准备好
  • 可读性/可维护性:查看代码重复、可理解的名称、每个块/方法的代码行数(以及每个块执行的操作数)以及跟踪的难度你的代码流.如果您想了解更多相关信息,请查看任意数量的关于重构、编程最佳实践、编码标准等方面的书籍.
  • 理论性能(最坏情况和摊销):Big-O 是您可以使用的指标.应同时测量 CPU 和内存消耗
  • 复杂性:估计一个普通的专业程序员需要如何实现(如果他们已经知道算法).看看这是否符合问题的实际难度
  • Correctness: If you aren't getting the right answer in all cases, then it isn't ready
  • Readability/maintainability: Look at code repetition, understandable names, the number of lines of code per block/method (and the number of things each block does), and how difficult it is to trace the flow of your code. Look at any number of books focused on refactoring, programming best-practices, coding standards, etc, if you want more information on this.
  • Theoretical performance (worst-case and ammortized): Big-O is a metric you can use. CPU and memory consumption should both be measured
  • Complexity: Estimate how it would take an average professional programmer to implement (if they already know the algorithm). See if that is in line with how difficult the problem actually is

至于您的实施:

  • 正确性:我建议自己编写单元测试来确定这一点和/或使用有趣的示例/边缘情况从头到尾调试它(在纸上).空、一项、两项、各种重复次数等
  • 可读性/可维护性:看起来基本没问题,尽管您的最后两条评论没有添加任何内容.你的代码所做的事情比书中的代码更明显一些
  • 性能:我相信两者都是 N 平方的.摊销成本是否较低,我会让你弄清楚:)
  • 是时候实现了:一个普通的专业人士应该能够在睡梦中编写这个算法,所以看起来不错
  • Correctness: I suggest writing unit tests to determine this for yourself and/or debugging it (on paper) from start to finish with interesting sample/edge cases. Null, one item, two items, various numbers of duplicates, etc
  • Readability/maintainability: It looks mostly fine, though your last two comments don't add anything. It is a bit more obvious what your code does than the code in the book
  • Performance: I believe both are N-squared. Whether the amortized cost is lower on one or the other I'll let you figure out :)
  • Time to implement: An average professional should be able to code this algorithm in their sleep, so looking good

这篇关于面试题:从未排序的链表中删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆