我怎样才能最大限度地在C#中的大数组元素的方式操作的性能 [英] How can I maximize the performance of element-wise operation on an big array in C#

查看:273
本文介绍了我怎样才能最大限度地在C#中的大数组元素的方式操作的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

的操作是乘以一个阵列的每个第i个元素(称为A)和相同的尺寸(B)的矩阵的第i个元素,并用该值更新A的相同的第i个元素赚来的。

The operation is to multiply every i-th element of a array (call it A) and i-th element of a matrix of the same size(B), and update the same i-th element of A with the value earned.

在一个数学公式,
    A'[i] = A [I] * B [I](0℃; I< N(A))

In a arithmetic formula, A'[i] = A[i]*B[i] (0 < i < n(A))

什么是在多核环境进行优化这一操作的最佳方式?

What's the best way to optimize this operation in a multi-core environment?

下面是我目前的code;

Here's my current code;

var learningRate = 0.001f;
var m = 20000;
var n = 40000;
var W = float[m*n]; 
var C = float[m*n];

//my current code ...[1]
Parallel.ForEach(Enumerable.Range(0, m), i =>
{
    for (int j = 0; j <= n - 1; j++)
    {
         W[i*n+j] *= C[i*n+j];
    }
});

//This is somehow far slower than [1], but I don't know why ... [2]
Parallel.ForEach(Enumerable.Range(0, n*m), i =>
{
    w[i] *= C[i]
});


//This is faster than [2], but not as fast as [1] ... [3]
for(int i = 0; i < m*n; i++)
{
    w[i] *= C[i]
}


测试了以下的方法。但是性能并没有得到根本改善。
http://msdn.microsoft.com/en-us/library/dd560853.aspx

   public static void Test1()
    {
        Random rnd = new Random(1);

        var sw1 = new Stopwatch();
        var sw2 = new Stopwatch();
        sw1.Reset();
        sw2.Reset();

        int m = 10000;
        int n = 20000;
        int loops = 20;

        var W = DummyDataUtils.CreateRandomMat1D(m, n);
        var C = DummyDataUtils.CreateRandomMat1D(m, n);

        for (int l = 0; l < loops; l++)
        {
            var v = DummyDataUtils.CreateRandomVector(n);
            var b = DummyDataUtils.CreateRandomVector(m);

            sw1.Start();

            Parallel.ForEach(Enumerable.Range(0, m), i =>
            {
                for (int j = 0; j <= n - 1; j++)
                {
                    W[i*n+j] *= C[i*n+j];
                }
            });
            sw1.Stop();

            sw2.Start();
            // Partition the entire source array. 
            var rangePartitioner = Partitioner.Create(0, n*m);

            // Loop over the partitions in parallel.
            Parallel.ForEach(rangePartitioner, (range, loopState) =>
            {
                // Loop over each range element without a delegate invocation. 
                for (int i = range.Item1; i < range.Item2; i++)
                {
                    W[i] *= C[i];
                }
            });

            sw2.Stop();

            Console.Write("o");
        }

        var t1 = (double)sw1.ElapsedMilliseconds / loops;
        var t2 = (double)sw2.ElapsedMilliseconds / loops;

        Console.WriteLine("t1: " + t1);
        Console.WriteLine("t2: " + t2);
    }


结果:

T1:119

T2:120.4

推荐答案

的问题是,在调用委托是比较快的,当你调用了很多次,code里面的委托是非常简单的加起来

The problem is that while invoking a delegate is relatively fast, it adds up when you invoke it many times and the code inside the delegate is very simple.

什么你可以尝试,而不是是使用分区来指定要迭代,它允许你遍历每个委托调用许多项目类似的范围内,(你在做什么[1]):

What you could try instead is to use a Partitioner to specify the range you want to iterate, which allows you to iterate over many items for each delegate invocation (similar to what you're doing in [1]):

Parallel.ForEach(Partitioner.Create(0, n * m), partition =>
    {
        for (int i = partition.Item1; i < partition.Item2; i++)
        {
            W[i] *= C[i];
        }
    });

这篇关于我怎样才能最大限度地在C#中的大数组元素的方式操作的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆