为什么在这种情况下使用 AsParallel() 比 foreach 慢? [英] Why is using AsParallel() slower than foreach in this case?

查看:52
本文介绍了为什么在这种情况下使用 AsParallel() 比 foreach 慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从这种格式的 excel 中提取数据

<前>产品1 |未命名col2 |产品2 |未命名col4 |产品3 |未命名col6 |-------------------------------------------------------------------------------@1foo |1.10 |@1foo |0.3 |@1foo |0.3@2foo |1.00 |@2foo |2 |@2foo |@3foo |1.52 |@3foo |2.53 |@3foo |@4foo |1.47 |||@4foo |1.31@5foo |1.49 |||@5foo |1.31

该文件使用了全部 255 个字段.使用 dapper-dot-net 我通过此代码获取数据

IEnumerable>excelDataRaw =conn.Query(string.Format("select * from {0}", table)).Cast>();

我将这些数据传递给这些测试方法.数据作为 IDictionaries 的 IEnumerable 返回,其中每个键是一个产品,每个值都是一个 IDictionary,其中每个键是产品列中的一个值,对应的值是产品列右侧的 unnamedcol 中的值.

var excelDataRefined = new List>>();excelDataRefined.Add(new Dictionary>());excelDataRefined[0].Add("product", new Dictionary());excelDataRefined[0]["product"].Add("@1foo", 1.1m);

方法:

private static Dictionary>Benchmark_foreach(IEnumerable>excelDataRaw){Console.WriteLine("1. 使用 foreach");var watch = new Stopwatch();看.开始();列表<字符串>headers = excelDataRaw.Select(dictionary => dictionary.Keys).First().ToList();bool isEven = false;列表<字符串>products = headers.Where(h => isEven = !isEven).ToList();var date = new List>();var price = new List>();foreach(标题中的字符串字段){字符串 product1 = 字段;if (headers.IndexOf(field) % 2 == 0){日期.添加(excelDataRaw.AsParallel().AsOrdered().Select(col => col[product1]).Where(row => row != null));}if (headers.IndexOf(field) % 2 == 1){价格.添加(excelDataRaw.AsParallel().AsOrdered().Select(col => col[product1] ?? 0m).Take(dates.Last().Count()));}}手表.停止();Console.WriteLine("重新排列数据:{0}s", watch.Elapsed.TotalSeconds);watch.Restart();var excelDataRefined = new Dictionary>();foreach (IEnumerable datelist in date){十进制数;IEnumerable<对象>日期列表1 = 日期列表;IEnumerable<对象>价目表 =价格[dates.IndexOf(datelist1)].Select(value => value ?? 0m).Where(内容 =>decimal.TryParse(content.ToString(), out num));字典<字符串,十进制>字典 =datelist1.Zip(pricelist, (k, v) => new { k, v }).ToDictionary(x =>(string)x.k, x =>decimal.Parse(x.v.ToString()));if (!excelDataRefined.ContainsKey(products[dates.IndexOf(datelist1)])){excelDataRefined.Add(products[dates.IndexOf(datelist1)], dict);}}手表.停止();Console.WriteLine("将数据压缩到:{0}s", watch.Elapsed.TotalSeconds);返回 excelDataRefined;}私有静态字典<字符串,IDictionary<字符串,十进制>>Benchmark_AsParallel(IEnumerable>excelDataRaw){Console.WriteLine("2. 使用 AsParallel().AsOrdered().ForAll");var watch = new Stopwatch();看.开始();列表<字符串>headers = excelDataRaw.Select(dictionary => dictionary.Keys).First().ToList();bool isEven = false;列表<字符串>products = headers.Where(h => isEven = !isEven).ToList();var date = new List>();var price = new List>();headers.AsParallel().AsOrdered().ForAll(字段 =>日期.添加(excelDataRaw.AsParallel().AsOrdered().TakeWhile(x => headers.IndexOf(field) % 2 == 0).Select(col =>col[field]).Where(row => row != null).ToList()));headers.AsParallel().AsOrdered().ForAll(字段 =>价格.添加(excelDataRaw.AsParallel().AsOrdered().TakeWhile(x => headers.IndexOf(field) % 2 == 1).Select(col =>col[字段] ??0m).Take(256).ToList()));日期.RemoveAll(x => x.Count() == 0);price.RemoveAll(x => x.Count() == 0);手表.停止();Console.WriteLine("重新排列数据:{0}s", watch.Elapsed.TotalSeconds);watch.Restart();var excelDataRefined = new Dictionary>();foreach (IEnumerable datelist in date){十进制数;IEnumerable<对象>日期列表1 = 日期列表;IEnumerable<对象>价目表 =价格[dates.IndexOf(datelist1)].Select(value => value ?? 0m).Where(内容 =>decimal.TryParse(content.ToString(), out num));字典<字符串,十进制>字典 =datelist1.Zip(pricelist, (k, v) => new { k, v }).ToDictionary(x =>(string)x.k, x =>decimal.Parse(x.v.ToString()));if (!excelDataRefined.ContainsKey(products[dates.IndexOf(datelist1)])){excelDataRefined.Add(products[dates.IndexOf(datelist1)], dict);}}手表.停止();Console.WriteLine("将数据压缩到:{0}s", watch.Elapsed.TotalSeconds);返回 excelDataRefined;}私有静态字典<字符串,IDictionary<字符串,十进制>>Benchmark_ForEach(IEnumerable>excelDataRaw){Console.WriteLine("3. 使用 ForEach");var watch = new Stopwatch();看.开始();列表<字符串>headers = excelDataRaw.Select(dictionary => dictionary.Keys).First().ToList();bool isEven = false;列表<字符串>products = headers.Where(h => isEven = !isEven).ToList();var date = new List>();var price = new List>();headers.ForEach(字段 =>日期.添加(excelDataRaw.TakeWhile(x => headers.IndexOf(field) % 2 == 0).Select(col => col[field]).Where(行 =>行 != null).ToList()));headers.ForEach(字段 =>价格.添加(excelDataRaw.TakeWhile(x => headers.IndexOf(field) % 2 == 1).Select(col => col[field] ?? 0m).取(256).ToList()));日期.RemoveAll(x => x.Count() == 0);price.RemoveAll(x => x.Count() == 0);手表.停止();Console.WriteLine("重新排列数据:{0}s", watch.Elapsed.TotalSeconds);watch.Restart();var excelDataRefined = new Dictionary>();foreach (IEnumerable datelist in date){十进制数;IEnumerable<对象>日期列表1 = 日期列表;IEnumerable<对象>价目表 =价格[dates.IndexOf(datelist1)].Select(value => value ?? 0m).Where(内容 =>decimal.TryParse(content.ToString(), out num));字典<字符串,十进制>字典 =datelist1.Zip(pricelist, (k, v) => new { k, v }).ToDictionary(x =>(string)x.k, x =>decimal.Parse(x.v.ToString()));if (!excelDataRefined.ContainsKey(products[dates.IndexOf(datelist1)])){excelDataRefined.Add(products[dates.IndexOf(datelist1)], dict);}}手表.停止();Console.WriteLine("将数据压缩到:{0}s", watch.Elapsed.TotalSeconds);返回 excelDataRefined;}

  • Benchmark_foreach 需要应用程序.3.5 秒重新排列数据,3 秒压缩数据.
  • Benchmark_AsParallel 需要应用程序.重新排列需要 12 秒,压缩数据需要 0,005 秒.
  • Benchmark_ForEach 需要应用程序.重新排列需要 16 秒,压缩数据需要 0,005 秒.

为什么会这样?我希望 AsParallel 是最快的,因为它并行执行而不是顺序执行.我如何优化它?

解决方案

为了进行并行计算,你必须有多个处理器或内核,否则你只是在线程池中排队等待 CPU 的任务.IE.单核机器上的 AsParallel 是顺序的,加上线程池和线程上下文切换的开销.即使在双核机器上,您也可能无法获得两个核心,因为许多其他东西都在同一台机器上运行.

实际上 .AsParallel() 只有当您有长时间运行的带有阻塞操作 (I/O) 的任务时才有用,操作系统可以挂起阻塞线程并让另一个线程运行.

I am extracting data from excel that is in this format

 product1   | unnamedcol2 | product2  | unnamedcol4 | product3  | unnamedcol6 |
-------------------------------------------------------------------------------
 @1foo      |        1.10 | @1foo     |         0.3 | @1foo     |         0.3
 @2foo      |        1.00 | @2foo     |           2 | @2foo     |
 @3foo      |        1.52 | @3foo     |        2.53 | @3foo     |
 @4foo      |        1.47 |           |             | @4foo     |        1.31
 @5foo      |        1.49 |           |             | @5foo     |        1.31

The file uses all 255 fields. Using dapper-dot-net i get the data through this code

IEnumerable<IDictionary<string, object>> excelDataRaw =
                conn.Query(string.Format("select * from {0}", table)).Cast<IDictionary<string, object>>();

I pass this data to these test methods. The data is returned as an IEnumerable of IDictionaries where each key is a product and each value is an IDictionary where each key is a value from the product column and the corresponding value is a value from unnamedcol that is to the right of the product column.

var excelDataRefined = new List<IDictionary<string, IDictionary<string, decimal>>>();
excelDataRefined.Add(new Dictionary<string, IDictionary<string, decimal>>());
excelDataRefined[0].Add( "product", new Dictionary<string, decimal>());
excelDataRefined[0]["product"].Add("@1foo", 1.1m);

The methods:

private static Dictionary<string, IDictionary<string, decimal>> Benchmark_foreach(IEnumerable<IDictionary<string, object>> excelDataRaw)
{
    Console.WriteLine("1. Using foreach");
    var watch = new Stopwatch();
    watch.Start();

    List<string> headers = excelDataRaw.Select(dictionary => dictionary.Keys).First().ToList();
    bool isEven = false;
    List<string> products = headers.Where(h => isEven = !isEven).ToList();
    var dates = new List<IEnumerable<object>>();
    var prices = new List<IEnumerable<object>>();

    foreach (string field in headers)
    {
        string product1 = field;
        if (headers.IndexOf(field) % 2 == 0)
        {
            dates.Add(
                excelDataRaw.AsParallel().AsOrdered().Select(col => col[product1]).Where(row => row != null));
        }

        if (headers.IndexOf(field) % 2 == 1)
        {
            prices.Add(
                excelDataRaw.AsParallel().AsOrdered().Select(col => col[product1] ?? 0m).Take(dates.Last().Count()));
        }
    }

    watch.Stop();
    Console.WriteLine("Rearange the data in: {0}s", watch.Elapsed.TotalSeconds);
    watch.Restart();

    var excelDataRefined = new Dictionary<string, IDictionary<string, decimal>>();
    foreach (IEnumerable<object> datelist in dates)
    {
        decimal num;
        IEnumerable<object> datelist1 = datelist;
        IEnumerable<object> pricelist =
            prices[dates.IndexOf(datelist1)].Select(value => value ?? 0m).Where(
                content => decimal.TryParse(content.ToString(), out num));
        Dictionary<string, decimal> dict =
            datelist1.Zip(pricelist, (k, v) => new { k, v }).ToDictionary(
                x => (string)x.k, x => decimal.Parse(x.v.ToString()));

        if (!excelDataRefined.ContainsKey(products[dates.IndexOf(datelist1)]))
        {
            excelDataRefined.Add(products[dates.IndexOf(datelist1)], dict);
        }
    }

    watch.Stop();
    Console.WriteLine("Zipped the data in: {0}s", watch.Elapsed.TotalSeconds);

    return excelDataRefined;
}

private static Dictionary<string, IDictionary<string, decimal>> Benchmark_AsParallel(IEnumerable<IDictionary<string, object>> excelDataRaw)
{
    Console.WriteLine("2. Using AsParallel().AsOrdered().ForAll");
    var watch = new Stopwatch();
    watch.Start();

    List<string> headers = excelDataRaw.Select(dictionary => dictionary.Keys).First().ToList();
    bool isEven = false;
    List<string> products = headers.Where(h => isEven = !isEven).ToList();
    var dates = new List<IEnumerable<object>>();
    var prices = new List<IEnumerable<object>>();

    headers.AsParallel().AsOrdered().ForAll(
        field =>
        dates.Add(
            excelDataRaw.AsParallel().AsOrdered().TakeWhile(x => headers.IndexOf(field) % 2 == 0).Select(
                col => col[field]).Where(row => row != null).ToList()));
    headers.AsParallel().AsOrdered().ForAll(
        field =>
        prices.Add(
            excelDataRaw.AsParallel().AsOrdered().TakeWhile(x => headers.IndexOf(field) % 2 == 1).Select(
                col => col[field] ?? 0m).Take(256).ToList()));
    dates.RemoveAll(x => x.Count() == 0);
    prices.RemoveAll(x => x.Count() == 0);

    watch.Stop();
    Console.WriteLine("Rearange the data in: {0}s", watch.Elapsed.TotalSeconds);
    watch.Restart();

    var excelDataRefined = new Dictionary<string, IDictionary<string, decimal>>();
    foreach (IEnumerable<object> datelist in dates)
    {
        decimal num;
        IEnumerable<object> datelist1 = datelist;
        IEnumerable<object> pricelist =
            prices[dates.IndexOf(datelist1)].Select(value => value ?? 0m).Where(
                content => decimal.TryParse(content.ToString(), out num));
        Dictionary<string, decimal> dict =
            datelist1.Zip(pricelist, (k, v) => new { k, v }).ToDictionary(
                x => (string)x.k, x => decimal.Parse(x.v.ToString()));

        if (!excelDataRefined.ContainsKey(products[dates.IndexOf(datelist1)]))
        {
            excelDataRefined.Add(products[dates.IndexOf(datelist1)], dict);
        }
    }

    watch.Stop();
    Console.WriteLine("Zipped the data in: {0}s", watch.Elapsed.TotalSeconds);

    return excelDataRefined;
}

private static Dictionary<string, IDictionary<string, decimal>> Benchmark_ForEach(IEnumerable<IDictionary<string, object>> excelDataRaw)
{
    Console.WriteLine("3. Using ForEach");
    var watch = new Stopwatch();
    watch.Start();

    List<string> headers = excelDataRaw.Select(dictionary => dictionary.Keys).First().ToList();
    bool isEven = false;
    List<string> products = headers.Where(h => isEven = !isEven).ToList();
    var dates = new List<IEnumerable<object>>();
    var prices = new List<IEnumerable<object>>();

    headers.ForEach(
        field =>
        dates.Add(
            excelDataRaw.TakeWhile(x => headers.IndexOf(field) % 2 == 0).Select(col => col[field]).Where(
                row => row != null).ToList()));
    headers.ForEach(
        field =>
        prices.Add(
            excelDataRaw.TakeWhile(x => headers.IndexOf(field) % 2 == 1).Select(col => col[field] ?? 0m).
            Take(256).ToList()));
    dates.RemoveAll(x => x.Count() == 0);
    prices.RemoveAll(x => x.Count() == 0);

    watch.Stop();
    Console.WriteLine("Rearange the data in: {0}s", watch.Elapsed.TotalSeconds);
    watch.Restart();

    var excelDataRefined = new Dictionary<string, IDictionary<string, decimal>>();
    foreach (IEnumerable<object> datelist in dates)
    {
        decimal num;
        IEnumerable<object> datelist1 = datelist;
        IEnumerable<object> pricelist =
            prices[dates.IndexOf(datelist1)].Select(value => value ?? 0m).Where(
                content => decimal.TryParse(content.ToString(), out num));
        Dictionary<string, decimal> dict =
            datelist1.Zip(pricelist, (k, v) => new { k, v }).ToDictionary(
                x => (string)x.k, x => decimal.Parse(x.v.ToString()));

        if (!excelDataRefined.ContainsKey(products[dates.IndexOf(datelist1)]))
        {
            excelDataRefined.Add(products[dates.IndexOf(datelist1)], dict);
        }
    }

    watch.Stop();
    Console.WriteLine("Zipped the data in: {0}s", watch.Elapsed.TotalSeconds);

    return excelDataRefined;
}

  • Benchmark_foreach needs app. 3,5s to rearrange and 3s to zip the data.
  • Benchmark_AsParallel needs app. 12s to rearrange and 0,005s to zip the data.
  • Benchmark_ForEach needs app. 16s to rearrange and 0,005s to zip the data.

Why does it behave like this? I expected AsParallel to be the fastest because it executes in parallel instead of sequential. Ho do i optimize this?

解决方案

In order for parallel computation to happen you have to have multiple processors or cores, otherwise you are just queueing up tasks in the threadpool waiting for the CPU. I.e. AsParallel on a single core machine is sequential plus the overhead of threadpool and thread context switch. Even on a two core machine, you may not get both cores, since lots of other things are running on the same machine.

Really .AsParallel() only becomes useful if you have long running tasks with blocking operations (I/O) where the OS can suspend the blocking thread and let another one run.

这篇关于为什么在这种情况下使用 AsParallel() 比 foreach 慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆