面试问题:从未排序的链接列表中删除重复项 [英] Interview question: remove duplicates from an unsorted linked list

查看:35
本文介绍了面试问题:从未排序的链接列表中删除重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读破解编码面试,第四版:150编程面试问题和解决方案,我正在尝试解决以下问题:

I'm reading Cracking the Coding Interview, Fourth Edition: 150 Programming Interview Questions and Solutions and I'm trying to solve the following question:

2.1编写代码以从未排序的链表中删除重复项.跟随上:如果您要如何解决这个问题不允许使用临时缓冲区吗?

2.1 Write code to remove duplicates from an unsorted linked list. FOLLOW UP: How would you solve this problem if a temporary buffer is not allowed?

我正在用C#解决它,所以我制作了自己的 Node 类:

I'm solving it in C#, so I made my own Node class:

public class Node<T> where T : class
{
    public Node<T> Next { get; set; }
    public T Value { get; set; }

    public Node(T value)
    {
        Next = null;
        Value = value;
    }
}

我的解决方案是遍历列表,然后对每个节点遍历列表的其余部分并删除任何重复项(请注意,我并没有按照书中的说明实际编译或测试它):

My solution is to iterate through the list, then for each node to iterated through the remainder of the list and remove any duplicates (note that I haven't actually compiled or tested this, as instructed by the book):

public void RemoveDuplicates(Node<T> head)
{
    // Iterate through the list
    Node<T> iter = head;
    while(iter != null)
    {
        // Iterate to the remaining nodes in the list
        Node<T> current = iter;
        while(current!= null && current.Next != null)
        {
            if(iter.Value == current.Next.Value)
            {
                current.Next = current.Next.Next;
            }

            current = current.Next;
        }    

        iter = iter.Next;
    }
}

这是书中的解决方案(作者用Java编写):

Here is the solution from the book (the author wrote it in java):

没有缓冲区,我们可以迭代两个指针:当前"正常迭代,而运行器"则迭代通过所有先前的节点来检查傻瓜.跑步者每个只会看到一个副本节点,因为如果有多个他们本来是重复的已删除.

Without a buffer, we can iterate with two pointers: "current" does a normal iteration, while "runner" iterates through all prior nodes to check for dups. Runner will only see one dup per node, because if there were multiple duplicates they would have been removed already.

public static void deleteDups2(LinkedListNode head) 
{
    if (head == null) return;

    LinkedListNode previous = head;
    LinkedListNode current = previous.next;

    while (current != null) 
    {
        LinkedListNode runner = head;

        while (runner != current) { // Check for earlier dups
            if (runner.data == current.data) 
            {
                LinkedListNode tmp = current.next; // remove current
                previous.next = tmp;
                current = tmp; // update current to next node
                break; // all other dups have already been removed
            }
            runner = runner.next;
        }
        if (runner == current) { // current not updated - update now
            previous = current;
            current = current.next;
        }
    }
}

因此,我的解决方案始终在寻找当前节点到末尾的重复项,而他们的解决方案则在寻找从头到当前节点的重复项.我觉得这两种解决方案都会遇到性能问题,具体取决于列表中有多少重复项以及它们如何分布(密度和位置).但总的来说:我的答案与书中的答案差不多吗?还是差得多?

So my solution always looks for duplicates for the current node to the end, while their solution looks for duplicates from the head to the current node. I feel like both solutions would suffer performance issues depending on how many duplicates there are in the list and how they're distributed (density and position). But in general: is my answer nearly as good as the one in the book or is it significantly worse?

推荐答案

如果给一个人一条鱼,他们会吃一天.如果你教一个人钓鱼...

If you give a person a fish, they eat for a day. If you teach a person to fish...

我对实施质量的衡量标准是:

My measures for the quality of an implementation are:

  • 正确性:如果在所有情况下都无法获得正确的答案,则说明尚未准备就绪
  • 可读性/可维护性:查看代码重复,易懂的名称,每个块/方法的代码行数(以及每个块执行的操作数)以及跟踪的难度您的代码流程.如果您想了解更多有关重构,编程最佳实践,编码标准等方面的书籍,请参阅.
  • 理论性能(最坏情况和摊销):Big-O是您可以使用的指标.应该同时测量CPU和内存消耗
  • 复杂度:估算普通的专业程序员实施所需的时间(如果他们已经知道算法).看看这是否与问题的实际难度相符
  • Correctness: If you aren't getting the right answer in all cases, then it isn't ready
  • Readability/maintainability: Look at code repetition, understandable names, the number of lines of code per block/method (and the number of things each block does), and how difficult it is to trace the flow of your code. Look at any number of books focused on refactoring, programming best-practices, coding standards, etc, if you want more information on this.
  • Theoretical performance (worst-case and ammortized): Big-O is a metric you can use. CPU and memory consumption should both be measured
  • Complexity: Estimate how it would take an average professional programmer to implement (if they already know the algorithm). See if that is in line with how difficult the problem actually is

关于您的实现:

  • 正确性:我建议编写单元测试来自己确定和/或调试(在纸上)从头到尾进行有趣的示例/边缘案例.空,一项,两项,各种重复项等
  • 可读性/可维护性:看起来不错,尽管最后两个注释没有添加任何内容.您的代码比书中的代码更明显
  • 性能:我相信两者都是N平方的.无论摊销成本是较低的还是较低的,我都会让您知道:)
  • 实施时间:普通专业人员应该可以在睡眠中编写此算法,因此看起来不错
  • Correctness: I suggest writing unit tests to determine this for yourself and/or debugging it (on paper) from start to finish with interesting sample/edge cases. Null, one item, two items, various numbers of duplicates, etc
  • Readability/maintainability: It looks mostly fine, though your last two comments don't add anything. It is a bit more obvious what your code does than the code in the book
  • Performance: I believe both are N-squared. Whether the amortized cost is lower on one or the other I'll let you figure out :)
  • Time to implement: An average professional should be able to code this algorithm in their sleep, so looking good

这篇关于面试问题:从未排序的链接列表中删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆