为什么我会看看〜使用本机代码20%的速度增长? [英] Why would I see ~20% speed increase using native code?

查看:106
本文介绍了为什么我会看看〜使用本机代码20%的速度增长?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何想法,为什么这个代码:

Any idea why this code:

extern "C" __declspec(dllexport) void Transform(double x[], double y[], int iterations, bool forward)
{
    long n, i, i1, j, k, i2, l, l1, l2;
    double c1, c2, tx, ty, t1, t2, u1, u2, z;

    /* Calculate the number of points */
    n = (long)pow((double)2, (double)iterations);

    /* Do the bit reversal */
    i2 = n >> 1;
    j = 0;
    for (i = 0; i < n - 1; ++i)
    {
    	if (i < j)
    	{
    		tx = x[i];
    		ty = y[i];
    		x[i] = x[j];
    		y[i] = y[j];
    		x[j] = tx;
    		y[j] = ty;
    	}
    	k = i2;
    	while (k <= j)
    	{
    		j -= k;
    		k >>= 1;
    	}
    	j += k;
    }

    /* Compute the FFT */
    c1 = -1.0; 
    c2 = 0.0;
    l2 = 1;
    for (l = 0; l < iterations; ++l)
    {
    	l1 = l2;
    	l2 <<= 1;
    	u1 = 1; 
    	u2 = 0;
    	for (j = 0; j < l1; j++) 
    	{
    		for (i = j; i < n; i += l2) 
    		{
    			i1 = i + l1;
    			t1 = u1 * x[i1] - u2 * y[i1];
    			t2 = u1 * y[i1] + u2 * x[i1];
    			x[i1] = x[i] - t1; 
    			y[i1] = y[i] - t2;
    			x[i] += t1;
    			y[i] += t2;
    		}
    		z = u1 * c1 - u2 * c2;
    		u2 = u1 * c2 + u2 * c1;
    		u1 = z;
    	}
    	c2 = sqrt((1.0 - c1) / 2.0);
    	if (forward) 
    		c2 = -c2;
    	c1 = sqrt((1.0 + c1) / 2.0);
    }

    /* Scaling for forward transform */
    if (forward)
    {
    	for (i = 0; i < n; ++i)
    	{
    		x[i] /= n;
    		y[i] /= n;
    	}
    }
}



运行速度比这个快20%代码?

runs 20% faster than this code?

public static void Transform(DataSet data, Direction direction)
{
    double[] x = data.Real;
    double[] y = data.Imag;
    data.Direction = direction;
    data.ExtremeImag = 0.0;
    data.ExtremeReal = 0.0;
    data.IndexExtremeImag = 0;
    data.IndexExtremeReal = 0;

    long n, i, i1, j, k, i2, l, l1, l2;
    double c1, c2, tx, ty, t1, t2, u1, u2, z;

    /* Calculate the number of points */
    n = (long)Math.Pow(2, data.Iterations);

    /* Do the bit reversal */
    i2 = n >> 1;
    j = 0;
    for (i = 0; i < n - 1; ++i)
    {
        if (i < j)
        {
            tx = x[i];
            ty = y[i];
            x[i] = x[j];
            y[i] = y[j];
            x[j] = tx;
            y[j] = ty;
        }
        k = i2;
        while (k <= j)
        {
            j -= k;
            k >>= 1;
        }
        j += k;
    }

    /* Compute the FFT */
    c1 = -1.0; 
    c2 = 0.0;
    l2 = 1;
    for (l = 0; l < data.Iterations; ++l)
    {
        l1 = l2;
        l2 <<= 1;
        u1 = 1; 
        u2 = 0;
        for (j = 0; j < l1; j++) 
        {
            for (i = j; i < n; i += l2) 
            {
                i1 = i + l1;
                t1 = u1 * x[i1] - u2 * y[i1];
                t2 = u1 * y[i1] + u2 * x[i1];
                x[i1] = x[i] - t1; 
                y[i1] = y[i] - t2;
                x[i] += t1;
                y[i] += t2;
            }
            z = u1 * c1 - u2 * c2;
            u2 = u1 * c2 + u2 * c1;
            u1 = z;
        }
        c2 = Math.Sqrt((1.0 - c1) / 2.0);
        if (direction == Direction.Forward) 
            c2 = -c2;
        c1 = Math.Sqrt((1.0 + c1) / 2.0);
    }

    /* Scaling for forward transform */
    if (direction == Direction.Forward)
    {
        for (i = 0; i < n; ++i)
        {
            x[i] /= n;
            y[i] /= n;
            if (Math.Abs(x[i]) > data.ExtremeReal)
            {
                data.ExtremeReal = x[i];
                data.IndexExtremeReal = (int)i;
            }
            if (Math.Abs(y[i]) > data.ExtremeImag)
            {
                data.ExtremeImag = y[i];
                data.IndexExtremeImag = (int)i;
            }
        }
    }
}

< IMG SRC =htt​​p://www.rghware.com/fft.pngALT =FFT/>

我创建在中间看到的CPU的下拉在我的应用程序中选择本机DLL FFT图的:

I created the drop in CPU seen in the middle of the graph by selecting the "Native DLL FFT" in my app:

http://www.rghware.com/InstrumentTuner.zip (源代码)

我认为这将在大多数PC上运行。你需要安装DirectX。我不得不使用某些硬件拍摄设置的一些问题。捕捉设置被认为是可配置的,但应用程序的发展已经被这个有趣的发现牵制。

I think this will run on most PCs. You’ll need to have DirectX installed. I had some issues using the capture settings for certain hardware. The capture settings were supposed to be configurable, but the app’s development has been sidetracked by this interesting find.

不管怎样,为什么我看到的速度增加20%使用本机代码?这似乎在一些我以前有假设的脸飞。

Anyway, why I’m seeing a 20% increase in speed using the native code? This seems to fly in the face of some of the assumptions I previously had.

更新

功能转换为不安全的方法和修复长/ INT问题之后。新的不安全方法实际上运行速度比本地方法(很酷)更快。

After converting the function to an unsafe method and fixing the long/int issue. The new unsafe method actually runs faster than the native method (pretty cool).

档案/

很明显,边界检查数组的20%,在这个FFT方法减慢的原因。由于它的性质,for循环的这种方法不能被优化。

It's obvious that the array bound checking is the cause of the 20% slow down in this FFT method. Due to it's nature, the for-loops in this method cannot be optimized.

感谢大家的帮助。

推荐答案

只是看看这段代码,我怀疑从我的经验相当显著放缓从C去++ - > C#

Just looking at this code, I'd suspect from my experience a fairly significant slowdown going from C++ -> C#.

你要面对这样一个例行到C#的天真端口的一个主要问题是,C#是要添加范围在这里每个数组检查检查。因为你从来没有通过,将获得优化的方式阵列循环(的详见这个问题的),几乎所有的数组访问是要接受的边界检查。

One major issue you're going to face in a naive port of a routine like this to C# is that C# is going to add bounds checking on every array check here. Since you're never looping through the arrays in a way that will get optimized (see this question for details), just about every array access is going to receive bounds checking.

在此外,该端口是相当接近,从C的1> 1的映射如果你通过一个良好的.NET探查运行它,你可能会发现,可以进行优化,以得到这回接近C ++的速度与一个一些伟大的斑点两调整(这几乎一直是我在移植这样的例程的经验)。

In addition, this port is pretty close to a 1->1 mapping from C. If you run this through a good .NET profiler, you'll probably find some great spots that can be optimized to get this back to near C++ speed with one or two tweaks (that's nearly always been my experience in porting routines like this).

如果你想获得这是在几乎相同的速度,但是,你会可能需要将其转换为不安全的代码并使用指针操作,而不是直接设置阵列。这将消除所有的边界检查问题,并得到你的速度回来了。

If you want to get this to be at nearly identical speed, though, you'll probably need to convert this to unsafe code and use pointer manipulation instead of directly setting the arrays. This will eliminate all of the bounds checking issues, and get your speed back.


编辑:我看到多了一个巨大的差异,这可能是你的C#不安全代码的运行速度的原因

I see one more huge difference, which may be the reason your C# unsafe code is running slower.

检查出的this页关于C#相比,C ++ ,特别是:

long类型:在C#中,long类型为64位,而在C ++中,它是32位的。

"The long type: In C#, the long type is 64 bits, while in C++, it is 32 bits."

您应该转换成C#版本使用INT,时间不长。在C#中,长是64位的类型。这实际上可能对你的指针操作产生深远的影响,因为我相信你在不经意间增加了长期> INT转换(带溢出检查)上的每个指针调用。

You should convert the C# version to use int, not long. In C#, long is a 64bit type. This may actually have a profound effect on your pointer manipulation, because I believe you are inadvertantly adding a long->int conversion (with overflow checking) on every pointer call.

此外,当你在它,你可能想尝试在选中块。 C ++是不是做的溢出检查你在C#中获得。

Also, while you're at it, you may want to try wrapping the entire function in an unchecked block. C++ isn't doing the overflow checking you're getting in C#.

这篇关于为什么我会看看〜使用本机代码20%的速度增长?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆