为什么我的线程.Net应用程序在分配大量内存时线性扩展? [英] Why doesn't my threaded .Net app scale linearly when allocating large amounts of memory?

查看:123
本文介绍了为什么我的线程.Net应用程序在分配大量内存时线性扩展?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于大内存分配对.Net运行时的可扩展性的影响,我遇到了一些奇怪的问题。在我的测试应用程序中,我在一个严格的循环中创建了大量的字符串,循环次数固定,并且每秒循环迭代一次。奇怪的是,当我在几个线程中运行这个循环 - 似乎速率不会线性增加。



让我告诉你的结果。我的机器是一个8gb,8核心运行Windows Server 2008 R1,32位。它有两个4核英特尔至强1.83GHz(E5320)处理器。执行的工作是对字符串的 ToUpper() ToLower()的一组交替调用。我对一个线程,两个线程等运行测试 - 达到最大值。下表中的列为:




  • 费率:所有主题的循环次数除以持续时间
  • 它被计算为由一个线程实现的速率乘以该测试的线程数。 b

    示例1:10,000个循环,8个线程,每个字符串1024个字符



    第一个示例从一个线程开始,用八个线程运行测试。每个线程创建10,000个字符串,每个字符为1024个字符:

     
    每个线程创建10000个字符串,每个字符为1024个字符,最多8个线程
    GCMode =服务器

    速率线性速率%差异线程
    ---------------------------- ----------------------------
    322.58 322.58 0.00%1
    689.66 645.16 -6.90%2
    882.35 967.74 8.82%3
    1081.08 1290.32 16.22%4
    1388.89 1612.90 13.89%5
    1666.67 1935.48 13.89%6
    2000.00 2258.07 11.43%7
    2051.28 2580.65 20.51% 8
    完成。



    示例2:10,000个循环,8个线程,每个字符串32,000个字符



    在第二个例子中,我将每个字符串的字符数增加到32,000。

     
    每个线程创建10000个字符串,使用多达8个线程
    GCMode =服务器

    速率线性速率%差异线程
    -------------------- ------------------------------------
    14.10 14.10 0.00%1
    24.36 28.21 13.64%2
    33.15 42.31 21.66%3
    40.98 56.42 27.36%4
    48.08 70.52 31.83%5
    61.35 84.63 27.51%6
    72.61 98.73 26.45%7
    67.85 112.84 39.86%8
    完成。

    注意与线性速率的差异;

    我的问题是:为什么这个应用程序不能线性缩放?



    我的观察结果



    假分享



    最初认为这可能是由于False共享,但是,正如你将在源代码中看到的,我没有共享任何集合,字符串是相当大的。可能存在的唯一重叠是在一个字符串的开头和另一个的结尾。



    服务器模式垃圾收集器



    我使用gcServer enabled = true,以便每个核都获得自己的堆和垃圾收集器线程。



    大对象堆



    我不认为我分配的对象被发送到大对象堆,因为他们在85000字节以下。



    String Interning



    我认为字符串值可能会由于实习而被共享 MSDN ,所以我试图编译interning禁用。



    其他数据类型



    我尝试过使用小型和大型整数数组,其中I循环遍历每个元素并更改值。它产生类似的结果,遵循更大的分配表现更糟的趋势。



    源代码



      using System; 
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Threading;
    using System.Diagnostics;
    using System.Runtime;
    using System.Runtime.CompilerServices;

    命名空间StackOverflowExample
    {
    public class Program
    {
    private static int columnWidth = 14;

    static void Main(string [] args)
    {
    int loopCount,maxThreads,stringLength;
    loopCount = maxThreads = stringLength = 0;
    try
    {
    loopCount = args.Length!= 0? Int32.Parse(args [0]):1000;
    maxThreads = args.Length!= 0? Int32.Parse(args [1]):4;
    stringLength = args.Length!= 0? Int32.Parse(args [2]):1024;
    }
    catch
    {
    Console.WriteLine(用法:StackOverFlowExample.exe [loopCount] [maxThreads] [stringLength]);
    System.Environment.Exit(2);
    }

    float rate;
    float linearRate = 0;
    秒表秒表
    Console.WriteLine(每个线程创建{0}个字符串,{1}个字符,最多使用{2}个线程,loopCount,stringLength,maxThreads);
    Console.WriteLine(GCMode = {0},GCSettings.IsServerGC?Server:Workstation);
    Console.WriteLine();
    PrintRow(Rate,Linear Rate,%Variance,Threads); ;
    PrintRow(4,.PadRight(columnWidth,' - '));

    for(int runCount = 1; runCount <= maxThreads; runCount ++)
    {
    //创建workers
    Worker [] workers = new Worker [runCount ];
    workers.Length.Range()。ForEach(index => workers [index] = new Worker());

    //开始计时和启动线程
    stopwatch = Stopwatch.StartNew();
    workers.ForEach(w => new Thread(
    new ThreadStart(
    ()=> w.DoWork(loopCount,stringLength)

    )。开始());

    //等待所有线程完成
    WaitHandle.WaitAll(
    workers.Select(p => p.Complete).ToArray());
    stopwatch.Stop();

    //打印结果
    rate =(float)loopCount * runCount / stopwatch.ElapsedMilliseconds;
    if(runCount == 1){linearRate = rate; }

    PrintRow(String.Format({0:#0.00},rate),
    String.Format({0:#0.00},linearRate * runCount) b $ b String.Format({0:#0.00}%,(1 - rate /(linearRate * runCount))* 100),
    runCount.ToString());
    }
    Console.WriteLine(Done。);
    }

    private static void PrintRow(params string [] columns)
    {
    columns.ForEach(c => Console.Write(c.PadRight )));
    Console.WriteLine();
    }

    private static void PrintRow(int repeatCount,string column)
    {
    for(int counter = 0; counter< repeatCount; counter ++)
    {
    Console.Write(column.PadRight(columnWidth));
    }
    Console.WriteLine();
    }
    }

    public class Worker
    {
    public ManualResetEvent Complete {get;私人集}

    public Worker()
    {
    Complete = new ManualResetEvent(false);
    }

    public void DoWork(int loopCount,int stringLength)
    {
    //构建字符串
    string theString =.PadRight , '一个');
    for(int counter = 0; counter< loopCount; counter ++)
    {
    if(counter%2 == 0){theString.ToUpper(); }
    else {theString.ToLower(); }
    }
    Complete.Set();
    }
    }

    public static class HandyExtensions
    {
    public static IEnumerable< int>范围(this int max)
    {
    for(int counter = 0; counter< max; counter ++)
    {
    yield return counter;
    }
    }

    public static void ForEach< T>(this IEnumerable< T> items,Action< T>动作)
    {
    foreach T项目)
    {
    action(item);
    }
    }
    }
    }



    .Config



     <?xml version =1.0encoding =utf-8?& 
    < configuration>
    < runtime>
    < gcServer enabled =true/>
    < / runtime>
    < / config>



    运行示例



    运行StackOverflowExample .exe,使用以下命令行参数调用它:



    StackOverFlowExample.exe [loopCount] [maxThreads] [stringLength] code>




    • loopCount :每个线程将操作的次数字符串。

    • maxThreads :要进度的主题数。

    • stringLength :填充字符串的字符数。


    解决方案

    您可能需要查看这个我的问题



    我遇到了一个类似的问题,这是由于事实,CLR执行线程间同步时分配内存,以避免重叠分配。现在,使用服务器GC,锁定算法可能不同 - 但是沿着相同行的东西可能会影响你的代码。


    I’ve run into something strange about the effect of large memory allocations on the scalability of the .Net runtime. In my test application I create lots of strings in a tight loop for a fixed number of cycles and spit out a rate of loop iterations per second. The weirdness comes in when I run this loop in several threads – it appears that the rate does not increase linearly. The problem gets even worse when you create large strings.

    Let me show you the results. My machine is an 8gb, 8-core box running Windows Server 2008 R1, 32-bit. It has two 4-core Intel Xeon 1.83ghz (E5320) processors. The "work" performed is a set of alternating calls to ToUpper() and ToLower() on a string. I run the test for one thread, two threads, etc – up to the maximum. The columns in the table below are:

    • Rate: The number of loops across all threads divided by the duration.
    • Linear Rate: The ideal rate if performance were to scale linearly. It is calculated as the rate achieved by one thread multiplied by the number of threads for that test.
    • Variance: Calculated as the percentage by which the rate falls short of the linear rate.

    Example 1: 10,000 loops, 8 threads, 1024 chars per string

    The first example starts off with one thread, then two threads and eventually runs the test with eight threads. Each thread creates 10,000 strings of 1024 chars each:

    Creating 10000 strings per thread, 1024 chars each, using up to 8 threads
    GCMode = Server
    
    Rate          Linear Rate   % Variance    Threads
    --------------------------------------------------------
    322.58        322.58        0.00 %        1
    689.66        645.16        -6.90 %       2
    882.35        967.74        8.82 %        3
    1081.08       1290.32       16.22 %       4
    1388.89       1612.90       13.89 %       5
    1666.67       1935.48       13.89 %       6
    2000.00       2258.07       11.43 %       7
    2051.28       2580.65       20.51 %       8
    Done.
    

    Example 2: 10,000 loops, 8 threads, 32,000 chars per string

    In the second example I’ve increased the number of chars for each string to 32,000.

    Creating 10000 strings per thread, 32000 chars each, using up to 8 threads
    GCMode = Server
    
    Rate          Linear Rate   % Variance    Threads
    --------------------------------------------------------
    14.10         14.10         0.00 %        1
    24.36         28.21         13.64 %       2
    33.15         42.31         21.66 %       3
    40.98         56.42         27.36 %       4
    48.08         70.52         31.83 %       5
    61.35         84.63         27.51 %       6
    72.61         98.73         26.45 %       7
    67.85         112.84        39.86 %       8
    Done.
    

    Notice the difference in variance from the linear rate; in the second table the actual rate is 39% less than the linear rate.

    My question is: Why does this app not scale linearly?

    My Observations

    False Sharing

    I initially thought that this could be due to False Sharing but, as you’ll see in the source code, I’m not sharing any collections and the strings are quite big. The only overlap that could exist is at the beginning of one string and the end of another.

    Server-mode Garbage Collector

    I’m using gcServer enabled=true so that each core gets its own heap and garbage collector thread.

    Large Object Heap

    I don't think that objects I allocate are being sent to the Large Object Heap because they are under 85000 bytes big.

    String Interning

    I thought that string values may being shared under the hood due to interningMSDN, so I tried compiling interning disabled. This produced worse results than those shown above

    Other data types

    I tried the same example using small and large integer arrays, in which I loop through each element and change the value. It produces similar results, following the trend of performing worse with larger allocations.

    Source Code

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Threading;
    using System.Diagnostics;
    using System.Runtime;
    using System.Runtime.CompilerServices;
    
    namespace StackOverflowExample
    {
      public class Program
      {
        private static int columnWidth = 14;
    
        static void Main(string[] args)
        {
          int loopCount, maxThreads, stringLength;
          loopCount = maxThreads = stringLength = 0;
          try
          {
            loopCount = args.Length != 0 ? Int32.Parse(args[0]) : 1000;
            maxThreads = args.Length != 0 ? Int32.Parse(args[1]) : 4;
            stringLength = args.Length != 0 ? Int32.Parse(args[2]) : 1024;
          }
          catch
          {
            Console.WriteLine("Usage: StackOverFlowExample.exe [loopCount] [maxThreads] [stringLength]");
            System.Environment.Exit(2);
          }
    
          float rate;
          float linearRate = 0;
          Stopwatch stopwatch;
          Console.WriteLine("Creating {0} strings per thread, {1} chars each, using up to {2} threads", loopCount, stringLength, maxThreads);
          Console.WriteLine("GCMode = {0}", GCSettings.IsServerGC ? "Server" : "Workstation");
          Console.WriteLine();
          PrintRow("Rate", "Linear Rate", "% Variance", "Threads"); ;
          PrintRow(4, "".PadRight(columnWidth, '-'));
    
          for (int runCount = 1; runCount <= maxThreads; runCount++)
          {
            // Create the workers
            Worker[] workers = new Worker[runCount];
            workers.Length.Range().ForEach(index => workers[index] = new Worker());
    
            // Start timing and kick off the threads
            stopwatch = Stopwatch.StartNew();
            workers.ForEach(w => new Thread(
              new ThreadStart(
                () => w.DoWork(loopCount, stringLength)
              )
            ).Start());
    
            // Wait until all threads are complete
            WaitHandle.WaitAll(
              workers.Select(p => p.Complete).ToArray());
            stopwatch.Stop();
    
            // Print the results
            rate = (float)loopCount * runCount / stopwatch.ElapsedMilliseconds;
            if (runCount == 1) { linearRate = rate; }
    
            PrintRow(String.Format("{0:#0.00}", rate),
              String.Format("{0:#0.00}", linearRate * runCount),
              String.Format("{0:#0.00} %", (1 - rate / (linearRate * runCount)) * 100),
              runCount.ToString()); 
          }
          Console.WriteLine("Done.");
        }
    
        private static void PrintRow(params string[] columns)
        {
          columns.ForEach(c => Console.Write(c.PadRight(columnWidth)));
          Console.WriteLine();
        }
    
        private static void PrintRow(int repeatCount, string column)
        {
          for (int counter = 0; counter < repeatCount; counter++)
          {
            Console.Write(column.PadRight(columnWidth));
          }
          Console.WriteLine();
        }
      }
    
      public class Worker
      {
        public ManualResetEvent Complete { get; private set; }
    
        public Worker()
        {
          Complete = new ManualResetEvent(false);
        }
    
        public void DoWork(int loopCount, int stringLength)
        {
          // Build the string
          string theString = "".PadRight(stringLength, 'a');
          for (int counter = 0; counter < loopCount; counter++)
          {
            if (counter % 2 == 0) { theString.ToUpper(); }
            else { theString.ToLower(); }
          }
          Complete.Set();
        }
      }
    
      public static class HandyExtensions
      {
        public static IEnumerable<int> Range(this int max)
        {
          for (int counter = 0; counter < max; counter++)
          {
            yield return counter;
          }
        }
    
        public static void ForEach<T>(this IEnumerable<T> items, Action<T> action)
        {
          foreach(T item in items)
          {
            action(item);
          }
        }
      }
    }
    

    App.Config

    <?xml version="1.0" encoding="utf-8" ?>
    <configuration>
      <runtime>
        <gcServer enabled="true"/>
      </runtime>
    </configuration>
    

    Running the Example

    To run StackOverflowExample.exe on your box, call it with these command-line parameters:

    StackOverFlowExample.exe [loopCount] [maxThreads] [stringLength]

    • loopCount: The number of times each thread will manipulate the string.
    • maxThreads: The number of threads to progress to.
    • stringLength: the number of characters to fill the string with.

    解决方案

    You may want to look that this question of mine.

    I ran into a similar problem that was due to the fact that the CLR performs inter-thread synchronization when allocating memory to avoid overlapping allocations. Now, with the server GC, the locking algorithm may be different - but something along those same lines may be affecting your code.

    这篇关于为什么我的线程.Net应用程序在分配大量内存时线性扩展?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆