为什么OrderBy返回IOrderedEnumerable< T>比排序快得多? [英] Why is OrderBy which returns IOrderedEnumerable<T> much faster than Sort?

查看:32
本文介绍了为什么OrderBy返回IOrderedEnumerable< T>比排序快得多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个很好的问题的后续操作, C#排序和排序比较.我将使用相同的示例:

This is a follow up of this excellent question C# Sort and OrderBy comparison. I will use the same example:

List<Person> persons = new List<Person>();
persons.Add(new Person("P005", "Janson"));
persons.Add(new Person("P002", "Aravind"));
persons.Add(new Person("P007", "Kazhal"));

争用的方法是:

persons.Sort((p1, p2) => string.Compare(p1.Name, p2.Name, true));
//and
persons.OrderBy(n => n.Name);

首先,我要说的是,我知道没有任何明显的性能差异值得担心.但是我很想知道为什么 OrderBy 的性能要比 Sort 好得多.我正在使用@phoog在原始问题中发布的答案.

Let me start by saying that I understand there isn't any significant performance difference to worry about. But I would love to know why does OrderBy perform so much better than Sort. I'm using the answer posted by @phoog in the original question.

private void button1_Click(object sender, EventArgs e)
{
    IEnumerable<Person> people;

    BenchMark(persons => persons.Sort((p1, p2) => string.Compare(p1.Name, p2.Name, true)));

    BenchMark(persons => people = persons.OrderBy(n => n.Name));
}

private static Random randomSeed = new Random();
public static string RandomString(int size, bool lowerCase)
{
    var sb = new StringBuilder(size);
    int start = (lowerCase) ? 97 : 65;
    for (int i = 0; i < size; i++)
    {
        sb.Append((char)(26 * randomSeed.NextDouble() + start));
    }
    return sb.ToString();
}

private static void BenchMark(Action<List<Person>> action)
{
    List<Person> persons = new List<Person>();
    for (int i = 0; i < 10000; i++)
    {
        persons.Add(new Person("P" + i.ToString(), RandomString(5, true)));
    }
    List<Person> unsortedPersons = new List<Person>(persons);

    Stopwatch watch = new Stopwatch();
    for (int i = 0; i < 100; i++)
    {
        watch.Start();

        action(persons);

        watch.Stop();
        persons.Clear();
        persons.AddRange(unsortedPersons);
    }

    MessageBox.Show(watch.Elapsed.TotalMilliseconds.ToString());
}

结果:

Sort() => 3500 ~ 5000 ms
OrderBy() => 0.2 ~ 1.5 ms

尽管我最初测试的列表较小,但差异仍然很大,但随着收藏数量的增加,它变得越来越耀眼.可能是我缺少了解.NET集合的关键,但是我的想法是,由于 Sort 对现有的 List< T> 起作用,因此它的开销应该较小(如果与处理同一 List< T> (在我们的情况下为 person )的 OrderBy 相比,但必须返回另一个集合 IOrderedEnumerable< T> .但是 OrderBy 的效果仍然要好得多.与 IEnumerable< T> 类型相比, List< T> 可能会有一定的开销,但是 Sort 仍然会作用于现有列表!此外,我很高兴看到 Linq 方法比现有的.NET方法运行得更快.

Though differences were profound even with smaller lists I tested initially, it became more and more glaring once the size of the collection went up. May be I'm missing something key to understanding .NET collections, but my thinking is since Sort acts on an existing List<T>, it should have lesser overhead (if every any) in processing when compared to OrderBy which acts on the same List<T> (in our case persons) but have to return another collection IOrderedEnumerable<T>. But still OrderBy performs far far better. List<T> might have certain overhead compared to IEnumerable<T> type, but Sort anyway acts on the existing list! Furthermore, I'm little amused to see a Linq method working faster than existing .NET method.

原始问题中的所有答案将 Sort OrderBy.ToList 进行比较,我认为这会产生一些开销,因此或多或少地表现均等.

All the answers in the original question compare Sort against OrderBy.ToList which I believe will have some overhead and therefore performs more or less equally.

实现上可能有什么区别?

What could be the implementation differences?

编辑:好,我学到了一些新知识.这是我确认推迟执行的方式.

Ok I learned something new. Here is how I confirmed about deferred execution.

private void button1_Click(object sender, EventArgs e)
{
    BenchMark(persons =>
    {
        persons.Sort((p1, p2) => string.Compare(p1.Name, p2.Name, true));
        foreach (var item in persons)
        {
            break;
        }
    });

    BenchMark(persons =>
    {
        IEnumerable<Person> people = persons.OrderBy(n => n.Name);
        foreach (var item in people)
        {
            break;
        }
    });
}

排序运行了4000-5000ms,而 OrderBy 运行了5000ms以上.所以确实我的结论是错误的.一旦我开始枚举集合,他们两个人的表现均等.我更喜欢 OrderBy 的语法:)

Sort ran in 4000 - 5000ms while OrderBy ran just above 5000ms. So indeed my conclusion was wrong. Both of them performed on equal terms once I started to enumerate the collections. I prefer the syntax of OrderBy anyday :)

我刚刚发现这与关于延迟执行的更有趣的问题一个>虽然不是关于完全订购.

Edit 2: I just found that this is exact duplicate of this one. But here is a more interesting question about deferred execution in general though not about ordering altogether.

推荐答案

在这种情况下, OrderBy 要快得多,因为您实际上没有执行它.

In this case, OrderBy is far faster because you're not actually executing it.

在您枚举结果之前,查询是 deferred ,因此它实际上从未进行过排序.在您实际枚举结果之前, IOrderedEnumerable< T> 不会处理输入,也不会进行任何形式的排序.

Until you enumerate the results, the query is deferred, so it's never actually doing the ordering. Until you actually enumerate through the results, the IOrderedEnumerable<T> doesn't process the input and do any form of ordering.

尝试将基准更改为:

 BenchMark(persons => people = persons.OrderBy(n => n.Name).Count());

Count()调用将强制实际执行排序(因为它需要枚举 IOrderedEnumerable< T> 来生成一个计数),该计数应该均匀您的时间安排很重要.

The Count() call will force the ordering to actually occur (since it needs to enumerate the IOrderedEnumerable<T> to generate a count), which should even out your timings significantly.

大多数LINQ扩展方法都是以这种方式工作的-直到您枚举它们(通过 Count(),调用 ToList()或仅在常规中使用它们)foreach 循环等),它们的影响可以忽略不计,因为除了构建可枚举对象之外,它们实际上没有直接做任何事情.其他基准与 OrderBy(...).ToList()相比的原因是,添加了 ToList()完全执行并实际订购结果.

Most LINQ extension methods work this way - until you enumerate them (via Count(), calling ToList(), or just using them in a normal foreach loop, etc), they will have negligible impact, as they don't actually do anything directly other than build the enumerable. The reason the other benchmarks compare to OrderBy(...).ToList() is that the addition of ToList() forces the OrderBy to fully execute and actually order the results.

这篇关于为什么OrderBy返回IOrderedEnumerable&lt; T&gt;比排序快得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆