字符串数组上的C#慢过滤器(LINQ vs. Loop) [英] C# slow filter on string array (LINQ vs. Loop)

查看:72
本文介绍了字符串数组上的C#慢过滤器(LINQ vs. Loop)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

包含8000个项目的字符串数组必须过滤为以'a'开头的项目(= 600项)。



使用LINQ非常慢
使用一个简单的循环非常快。



在这种情况下是否有提高LINQ性能的途径?



我的尝试:



 string [] items = {.. 。来自csv的8000件物品......}; 
var foo = new List< string>();

// 3秒..
foo.AddRange(items.Where(item => item.StartsWith(a)))

// < 1秒
foreach(项目中的var项目)
if(item.StartsWith(a))foo.Add(item);

解决方案

Linq不是为性能而设计的,它是为易用性而设计的。如果您正在处理大量数据并且性能很重要,那么请使用foreach循环。


我怀疑您的代码或计算机上还有其他内容。过滤8000个项目的数组需要3秒钟,速度非常慢。



使用 Benchmark.NET [ ^ ]测试以下代码:

< pre lang =C#> public class LinqBenchmark
{
private readonly string [] _data;

public LinqBenchmark()
{
_data = new string [ 8000 ];

var rnd = new Random();
for int i = 0 ; i < _data.Length; i ++)
{
_data [i] = rnd.Next( 0 1000 )。ToString();
}
}

[基准]
public 列表< string>对于()
{
var result = new 列表< string>() ;
for int i = 0 ; i < _data.Length; i ++)
{
if (_data [i] .StartsWith( 1))
{
result.Add(_data [I]);
}
}

return 结果;
}

[基准]
public 列表< string> ForEach()
{
var result = new List< string>() ;
foreach string item in _data)
{
if (item.StartsWith( 1))
{
result.Add(item);
}
}

return 结果;
}

[基准]
public 列表< string> AddRange()
{
var result = new List< string>() ;
result.AddRange(_data.Where(i = > i.StartsWith( < span class =code-string> 1)));
返回结果;
}

[基准]
public 列表< string> ToList()
{
return _data.Where(i = > i .StartsWith( 1))。ToList();
}
}

结果:

 BenchmarkDotNet = v0.11.1,OS = Windows 10.0.17134.285(1803 / April2018Update / Redstone4)
Intel Core i7-4770K CPU 3.50GHz(Haswell),1个CPU,8个逻辑和4个物理内核
频率= 3417969 Hz,分辨率= 292.5714 ns,Timer = TSC
[主机]:.NET Framework 4.7.2(CLR 4.0.30319.42000),32位LegacyJIT-v4.7.3163.0
DefaultJob:.NET Framework 4.7.2(CLR 4.0.30319.42000),32bit LegacyJIT-v4.7.3163.0


方法|意思是|错误| StdDev |
--------- | ---------:| ----------:| ----------:|
For | 772.1我们| 10.709我们| 10.017我们|
ForEach | 766.5我们| 1.323我们| 1.238我们|
AddRange | 796.3我们| 2.858我们| 2.534我们|
ToList | 798.5我们| 12.835 us | 12.005我们|

// *提示*
异常值
LinqBenchmark.AddRange:默认 - > 1个异常值被删除

// * Legends *
平均值:所有测量值的算术平均值
错误:99.9%置信区间的一半
StdDev:所有标准偏差测量
1 us:1微秒(0.000001秒)



如您所见,这些方法平均花费的时间不超过一毫秒。它们之间的差异小于30微秒。



LINQ可能不是最高性能的选项,但差异通常不会像你的时间那样差。引用。


这里回答的时间太晚了,但我很好奇一个稍微严格的探索会产生什么。用于分析/选择人口的方法,对于如此小的人口(8K项目),使用这样一个简单的过滤器(以'a'开头),应该在很大程度上无关紧要。即使拥有1亿件物品,我能做的最糟糕的事情也只有2秒钟。



相反,你的问题更可能是产生/阅读你的物品。



我猜你必须有一个特别慢的CSV阅读器。即使对于相对较小的8K项目大小,除非你的每个项目有一个巨大的字符大小,我希望即使是最差的CSV阅读器(我知道)也能提供快速的性能。



结果和测试工具如下:



结果:



StartsWithA:00: 00:31.3843456

GetMatchesIndexCount:00:00:00.4149453

GetMatchesIndexListAdHoc:00:00:00.4930803

GetMatchesIndexListPreAllocated:00:00:00.4762712

GetMatchesIndexListTruncated:00:00:00.4896025

GetMatchesForeachCount:00:00:00.4298655

GetMatchesForeachListAdHoc:00:00:00.4599720

GetMatchesForeachListPreAllocated:00:00:00.4488830

GetMatchesForeachListTruncated:00:00:02.0583127

GetMatchesLinqArray:00:00:00.5453610

GetMatchesLinqList:00: 00:00.4848105



节目:



使用系统; 
使用System.Linq;
使用System.Text;
使用System.Collections.Generic;
使用System.Diagnostics;

名称空间PrefixBenchmark
{
公共类程序
{
public const int OddsOfA = 26; // 1:26以a开头的几率
public const int SampleCount = 100000000; //测试时使用的字符串数
public const int MaximumLength = 5; //最大字符串长度

public static void Main(string [] args)
{
var stopwatch = new Stopwatch();

stopwatch.Restart();
string [] sequence = StartsWithA(SampleCount,new Random(),OddsOfA,MaximumLength);
stopwatch.Stop();
Console.WriteLine(


A string array with 8000 items has to be filtered into items starting with 'a' (=600 items).

Using LINQ is very slow
Using a simple loop is very fast.

Is there a away to improve the LINQ performance in this case?

What I have tried:

string[] items = { ... 8000 items from csv ...};
var foo = new List<string>();

// 3 seconds..
foo.AddRange(items.Where(item => item.StartsWith("a")))

// <1 second
foreach (var item in items)
    if (item.StartsWith("a")) foo.Add(item);

解决方案

Linq is not designed for performance, it is designed for ease of use. If you are dealing with a lot of data and performance is important then use the foreach loop.


I suspect there's something else going on with your code, or with your computer. Three seconds to filter an array of 8000 items is unbelievably slow.

Using Benchmark.NET[^] to test the following code:

public class LinqBenchmark
{
    private readonly string[] _data;

    public LinqBenchmark()
    {
        _data = new string[8000];

        var rnd = new Random();
        for (int i = 0; i < _data.Length; i++)
        {
            _data[i] = rnd.Next(0, 1000).ToString();
        }
    }

    [Benchmark]
    public List<string> For()
    {
        var result = new List<string>();
        for (int i = 0; i < _data.Length; i++)
        {
            if (_data[i].StartsWith("1"))
            {
                result.Add(_data[i]);
            }
        }

        return result;
    }

    [Benchmark]
    public List<string> ForEach()
    {
        var result = new List<string>();
        foreach (string item in _data)
        {
            if (item.StartsWith("1"))
            {
                result.Add(item);
            }
        }

        return result;
    }

    [Benchmark]
    public List<string> AddRange()
    {
        var result = new List<string>();
        result.AddRange(_data.Where(i => i.StartsWith("1")));
        return result;
    }

    [Benchmark]
    public List<string> ToList()
    {
        return _data.Where(i => i.StartsWith("1")).ToList();
    }
}

The results:

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.285 (1803/April2018Update/Redstone4)
Intel Core i7-4770K CPU 3.50GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
Frequency=3417969 Hz, Resolution=292.5714 ns, Timer=TSC
  [Host]     : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.3163.0
  DefaultJob : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.3163.0


   Method |     Mean |     Error |    StdDev |
--------- |---------:|----------:|----------:|
      For | 772.1 us | 10.709 us | 10.017 us |
  ForEach | 766.5 us |  1.323 us |  1.238 us |
 AddRange | 796.3 us |  2.858 us |  2.534 us |
   ToList | 798.5 us | 12.835 us | 12.005 us |

// * Hints *
Outliers
  LinqBenchmark.AddRange: Default -> 1 outlier  was  removed

// * Legends *
  Mean   : Arithmetic mean of all measurements
  Error  : Half of 99.9% confidence interval
  StdDev : Standard deviation of all measurements
  1 us   : 1 Microsecond (0.000001 sec)


As you can see, none of the methods took more than one millisecond on average. The difference between them is less than 30 microseconds.

LINQ may not be the most performant option, but the difference isn't usually anywhere near as bad as the times you're quoting.


Really late to answer here, but I was curious what a slightly more rigorous exploration would yield. The method used to analyze/select your population, for such a small population (8K items), with such a simple filter (starts with 'a'), should be largely irrelevant. Even with a population of 100M items, the worst I could do was about 2 seconds.

Instead, its far more likely that your problem is with the generation/reading of your items.

I am guessing you must have an exceptionally slow CSV reader. Even for the comparatively small 8K item size, unless you have a huge character size per item, I would expect fast performance from even the worst CSV reader (of which I am aware).

Results and test harness follow...

Results:

StartsWithA: 00:00:31.3843456
GetMatchesIndexCount: 00:00:00.4149453
GetMatchesIndexListAdHoc: 00:00:00.4930803
GetMatchesIndexListPreAllocated: 00:00:00.4762712
GetMatchesIndexListTruncated: 00:00:00.4896025
GetMatchesForeachCount: 00:00:00.4298655
GetMatchesForeachListAdHoc: 00:00:00.4599720
GetMatchesForeachListPreAllocated: 00:00:00.4488830
GetMatchesForeachListTruncated: 00:00:02.0583127
GetMatchesLinqArray: 00:00:00.5453610
GetMatchesLinqList: 00:00:00.4848105

Program:

using System;
using System.Linq;
using System.Text;
using System.Collections.Generic;
using System.Diagnostics;

namespace PrefixBenchmark
{
  public class Program
  {
    public const int OddsOfA = 26; // 1:26 odds of starting with "a".
    public const int SampleCount = 100000000; // Number of strings to use while testing
    public const int MaximumLength = 5; // Maximum string length

    public static void Main(string[] args)
    {
      var stopwatch = new Stopwatch();

      stopwatch.Restart();
      string[] sequence = StartsWithA(SampleCount, new Random(), OddsOfA, MaximumLength);
      stopwatch.Stop();
      Console.WriteLine(


这篇关于字符串数组上的C#慢过滤器(LINQ vs. Loop)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆