优化 C# 大型数据集迭代 - 探查器中的外部代码和奇怪的行为 [英] Optimizing C# large dataset iterations - External code in profiler and weird behavior

查看：17 发布时间：2021/11/24 20:31:42 c# performance collections .net-core cpu

本文介绍了优化 C# 大型数据集迭代 - 探查器中的外部代码和奇怪的行为的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当前的任务是遍历大量字典，这让我很头疼.我无法在这里确定高 CPU 使用率的确切来源，所以我希望这里的一些 C# 专家可以给我一些提示和技巧.

The current task, iterating over massive dictionaries, is giving me a headache. I cannot pinpoint the exact source of high CPU usage here so I hope some of the C# gurus here can give me some hints and tips.

设置是 10 个预分配的 Guid-byte[] 字典，每个字典包含一百万个条目.该过程正在迭代所有这些，每个字典都有自己的线程.简单地遍历所有这些并将 byte[] 引用传递给迭代委托，产生随机结果需要不到 2 毫秒，但实际上访问包含条目中的任何字节都会导致这个数字上升到 300+ 毫秒.

The setup is 10 preallocated Guid-byte[] dictionaries, each holding one million entries. The process is iterating over all of them, each dictionary has it's own thread. Simply iterating over all of them and passing byte[] reference to iteration delegate, yielding random result takes under 2ms, but actually accessing any byte in the containing entries causes this number to rise to 300+ms.

注意:迭代委托是在任何迭代之前构造的，然后我只传递引用.

Note: The iteration delegate is constructed before any iterations and then I'm only passing reference.

如果我没有对接收到的字节引用做任何事情，这一切都非常快:

If i'm not doing anything with the received byte reference, it's all incredibly fast:

            var iterationDelegate = new Action<byte[]>((bytes) =>
            {
                var x = 5 + 10;
            });

但是一旦我尝试访问第一个字节(它实际上包含指向其他地方的行元数据的指针)

But once I attempt to access the very first byte (that actually contains a pointer to the row's metadata somewhere else)

            var iterationDelegate = new Action<byte[]>((bytes) =>
            {
                var b = (int)bytes[0];
            });

总时间猛增，更奇怪的是，第一组迭代需要30ms，第二组40+，第三组100+，第四组需要500ms+……然后我停止测试性能，休眠调用线程几秒钟，一旦我再次开始迭代，它会在 30 毫秒时随意开始，然后像以前一样上升，直到我再次给它喘息的时间".

The total time shoots up and what's even weirder, the first set of iterations takes 30ms, the second 40+, the third 100+ and the fourth can take 500ms+... then I stop testing the performance, Sleep the calling thread for a few seconds and once I start iterating again, it starts casually at 30ms and then rises same as before until I give it "time to breathe" again.

当我在 VS CPU 调用树中查看它时，93% 的 CPU 被 [外部代码] 消耗，我无法查看或至少看到它是什么.

When I watch it in the VS CPU call tree, 93% of the CPU is consumed by [External Code] that I cannot view or at least see what it is.

有什么我可以做的吗?GC 是不是遇到了困难?

Is there anything I can do to help this? Is it the GC having a rough time?

编辑 1:我要运行的实际代码是:

Edit 1: The actual code I want to run is:

            var iterationDelegate = new Action<byte[]>((data) =>
            {
                //compare two bytes, ensure the row belongs to desired table
                if (data[0] != table.TableIndex)
                    return;

                //get header length
                var headerLength = (int)data[1];

                //process the header info and retrieve the desired column data position:

                var columnInfoPos = (key * 6) + 2;

                var pointers = new int[3] {
                    //data position
                BitConverter.ToInt32(new byte[4] {
                    data[columnInfoPos],
                    data[columnInfoPos + 1],
                    data[columnInfoPos + 2],
                    data[columnInfoPos + 3] }),
                    //data length
                BitConverter.ToUInt16(new byte[2] {
                    data[columnInfoPos + 4],
                    data[columnInfoPos + 5] }),
                //column info position
                columnInfoPos };


            });

但这段代码更慢，迭代次数为~150、~300、~600、700+

But this code is even slower, the iteration times are ~150, ~300, ~600, 700+

这是在各个线程中为每个存储保持活动状态的工作类:

This is the worker class that's kept alive for each store in respective threads:

            class PartitionWorker
            {
                private ManualResetEvent waitHandle = new ManualResetEvent(true);
                private object key = new object();
                private bool stop = false;
                private List<Action> queue = new List<Action>();

                public void AddTask(Action task)
                {
                    lock (key)
                        queue.Add(task);
                    waitHandle.Set();
                }

                public void Run()
                {
                    while (!stop)
                    {
                        lock (key)
                            if (queue.Count > 0)
                            {
                                var task = queue[0];
                                task();
                                queue.Remove(task);
                                continue;
                            }
                        waitHandle.Reset();
                        waitHandle.WaitOne();
                    }
                }

                public void Stop()
                {
                    stop = true;
                }
            }

最后是启动迭代的代码，该代码从每个传入 TCP 请求的任务中运行.

And lastly a code that launches the iterations, this code is run from a Task for each incoming TCP request.

            for (var memoryPartition = 0; memoryPartition < partitions; memoryPartition++)
            {
                var memIndex = memoryPartition;
                mem[memIndex].AddJob(() =>
                {
                    try
                    {
                        //... to keep it shor i have excluded readlock and try/finally
                        foreach (var obj in mem[memIndex].innerCache.Values)
                        {
                            iterationDelegate(obj.bytes);
                        }
                        //release readlock in finally..
                    }
                    catch
                    {

                    }
                    finally
                    {
                        latch.Signal();
                    }
                });
            }
            try
            {
                latch.Wait(50);
                sw.Stop();
                Console.WriteLine("Found " + result.Count + " in " + sw.Elapsed.TotalMilliseconds + "ms");
            }
            catch
            {
                Console.WriteLine(">50");
            }

字典使用

private Dictionary<Guid, byte[]> innerCache = new Dictionary<Guid, byte[]>(part_max_entries);

关于条目，它们平均为 70 个字节.该过程占用了大约 2Gb 的内存，其中 10 000 000 个条目分布在 10 个字典中.

and regarding the entries, they are 70 bytes on average. The process is taking around 2Gb of memory with 10 000 000 entries split among 10 dictionaries.

条目的结构如下:

T |HL |{POS |销售点 |销售点 |销售点 |连 |连} |{数据字节}

T | HL | {POS | POS | POS | POS | LEN | LEN} | {data bytes}

哪里|表示单独的字节

T 是指向表元数据字典的字节指针
HL 是头部分的字节长度，如果条目

POS 和 LEN 对条目中的每个数据值重复:

POS and LEN repeat for each data value in the entry:

POSx4 = int 指示此数据在条目中的位置
POSx2 = 条目中此数据的 ushort 长度

然后 {data bytes} 是数据有效载荷

and then {data bytes} are the data payload

推荐答案

对于那些可能想知道的人来说，最大的性能提升是实际使用热旋转而不是 sleep/delaying/WaitHandles.即使有大量并行请求，CPU 命中率也可以忽略不计.对于非常密集的操作，有一个回退实现，如果旋转时间超过 3 毫秒，它会回退到线程等待.代码现在以相当恒定的 24 毫秒/10 百万条目运行.此外，从代码中删除任何 GC 集合并尽可能多地回收变量也是有益的.

For those who might be wondering, the greatest performance gain was to actually use hot spinning instead of sleeping/delaying/WaitHandles. The CPU hit is negligible even with large number of parallel requests. For very intensive operations There is a fallback implemented, that if the spinning takes longer than 3ms, it falls back to Thread wait. The code is now running at quite constant 24ms / 10mil entries. Also removing any GC collections from the code and recycling as many variables as I can was beneficial.

这是我使用的微调器代码:

Here's the spinner code I use:

    private static void spin(ref Stopwatch sw, double spinSeconds)
    {
        sw.Start();
        while (sw.ElapsedTicks < spinSeconds) { }
        sw.Stop();
    }

注意:这只能用于在它自己的线程中运行的代码！如果你在单线程应用中使用它，你会在这里阻塞你的所有代码.

Note: This can only be used with code that is running in it's own thread! If you use it in single-threaded application, you will block all your code here.

另外值得注意的是，出于某种原因，以某种方式重写 for 循环使其计数为 0 会对性能产生重大影响.我不知道原因的确切机制，但我认为与零相比更快.

Also it's worth noting, that for some reason rewriting the for loop in a way so it counts to 0 had a significant performance impact. I don't know the exact mechanics as to why, but I assume comparing to zero is simply faster.

我还修改了字典，它现在是一个 Dictionary(Guid,Int).我添加了一个 byte[][] 数组，字典 int 指向这个数组中的一个索引.迭代这个数组比枚举字典元素并迭代它们要快得多.不过，我需要实施一些机制来确保一致性.

I also modified the dictionary, it's now a Dictionary(Guid,Int). I added a byte[][] array and the dictionary int points to an index in this array. It is way faster iterating over this array than enumerating the dictionary elements and iterating over them. There are mechanics I needed to implement to ensure consistency though.

这篇关于优化 C# 大型数据集迭代 - 探查器中的外部代码和奇怪的行为的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

优化 C# 大型数据集迭代 - 探查器中的外部代码和奇怪的行为 [英] Optimizing C# large dataset iterations - External code in profiler and weird behavior

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

优化 C# 大型数据集迭代 - 探查器中的外部代码和奇怪的行为 [英] Optimizing C# large dataset iterations - External code in profiler and weird behavior

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭