调用本地code。与手写汇编 [英] Invoking native code with hand-written assembly

查看:141
本文介绍了调用本地code。与手写汇编的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从托管程序调用本机的功能。我这样做的pre编译库和一切都进行得很顺利。在这一刻我建立我自己的图书馆,我不能得到这个工作。

I'm trying to call a native function from a managed assembly. I've done this on pre-compiled libraries and everything has went well. At this moment I'm building my own library, and I can't get this work.

本机DLL源如下:

#define DERM_SIMD_EXPORT        __declspec(dllexport)

#define DERM_SIMD_API           __cdecl

extern "C" {

    DERM_SIMD_EXPORT void DERM_SIMD_API Matrix4x4_Multiply_SSE(float *result, float *left, float *right);

}

void DERM_SIMD_API Matrix4x4_Multiply_SSE(float *result, float *left, float *right) {
    __asm {
       ....
    }
}

此后,我们有管理code,它加载存储库,一个函数指针创建委托。

Hereafter we have the managed code which loads the library and create a delegate from a function pointer.

public unsafe class Simd
{
    [UnmanagedFunctionPointer(CallingConvention.Cdecl)]
    public delegate void MatrixMultiplyDelegate(float* result, float* left, float* right);

    public static MatrixMultiplyDelegate MatrixMultiply;

    public static void LoadSimdExtensions()
    {
        string assemblyPath = "Derm.Simd.dll";

        IntPtr address = GetProcAddress.GetAddress(assemblyPath, "Matrix4x4_Multiply_SSE");

        if (address != IntPtr.Zero) {
            MatrixMultiply = (MatrixMultiplyDelegate)Marshal.GetDelegateForFunctionPointer(address, typeof(MatrixMultiplyDelegate));
        }
    }
}

使用上面的code源代码运行没有错误(获得函数指针,并委托实际上是创建。

Using the sources above the code runs without errors (the function pointer is obtained, and the delegate is actually created.

当我打电话委托的问题提出了:(!我也可以调试它)执行它,但在函数退出托管的应用程序引发的 System.ExecutionEngineException 的(当它不退出而不例外)。

The problem raises when I call the delegate: it is executed (and I can debug it also!), but at function exit the managed application raises a System.ExecutionEngineException (when it doesn't exit without exceptions).

实际的问题是功能实现:它包含的 ASM 的使用SSE指令块;如果我删除的 ASM 的块中,code完美的作品。

The actual problem is the function implementation: it contains a asm block with SSE instructions; if I remove the asm block, the code works perfectly.

我怀疑我缺少某些注册表保存/恢复组装,但我在这头完全地无知。

I suspect I am missing some registry save/restore assembly, but I'm completly ignorant on this side.

但奇怪的是,如果我改变调用约定来__stdcall,调试版本似乎的工作,而发行版本表现得好像用__cdecl调用convetion。

The strange thing is that if I change the calling convention to __stdcall, the debug version "seems" to work, while the release version behave as if __cdecl calling convetion was used.

(而且只是因为我们在这里,你能澄清如果调用convetion事项?)

好的,谢谢给的大卫·赫弗南的意见我找出造成问题的坏指令如下:

Ok, thank to the David Heffernan comment I find out that the bad instructions causing the problem are the following:

 movups result[ 0], xmm4;
 movups result[16], xmm5;

MOVUPS 的指令传送16字节到(未对齐)内存。

movups instructions moves 16 bytes into (unaligned) memory.

的功能是由以下code称为:

The function is called by the following code:

 unsafe {
    float* prodFix = (float*)prod.MatrixBuffer.AlignedBuffer.ToPointer();
    float* m1Fix = (float*)m2.MatrixBuffer.AlignedBuffer.ToPointer();
    float* m2Fix = (float*)m1.MatrixBuffer.AlignedBuffer.ToPointer();

    if (Simd.Simd.MatrixMultiply == null) {
                    // ... unsafe C# code
    } else {
        Simd.Simd.MatrixMultiply(prodFix, m1Fix, m2Fix);
    }
}

其中的 MatrixBuffer 的是一类矿井;其成员的 AlignedBuffer 的按以下方式分配的:

Where MatrixBuffer is a class of mine; its member AlignedBuffer is allocated in the following way:

// Allocate unmanaged buffer
mUnmanagedBuffer = Marshal.AllocHGlobal(new IntPtr((long)(size + alignment - 1)));

// Align buffer pointer
long misalignment = mUnmanagedBuffer.ToInt64() % alignment;
if (misalignment != 0)
    mAlignedBuffer = new IntPtr(mUnmanagedBuffer.ToInt64() + misalignment);
else
    mAlignedBuffer = mUnmanagedBuffer;

也许错误引起的 Marshal.AllocHGlobal 的IntPtr 的黑魔法?

这是最小的来源发现错误:

This is the minimal source to spot the error:

void Matrix4x4_Multiply_SSE(float *result, float *left, float *right)
{
    __asm {
        movups xmm0,    right[ 0];

        movups result, xmm0;
    }
}


int main(int argc, char *argv[])
{
    float r0[16];
    float m1[16], m2[16];

    m1[ 0] = 1.0f; m1[ 4] = 0.0f; m1[ 8] = 0.0f; m1[12] = 0.0f;
    m1[ 1] = 0.0f; m1[ 5] = 1.0f; m1[ 9] = 0.0f; m1[13] = 0.0f;
    m1[ 2] = 0.0f; m1[ 6] = 0.0f; m1[10] = 1.0f; m1[14] = 0.0f;
    m1[ 3] = 0.0f; m1[ 7] = 0.0f; m1[11] = 0.0f; m1[15] = 1.0f;

    m2[ 0] = 1.0f; m2[ 4] = 0.0f; m2[ 8] = 0.0f; m2[12] = 0.0f;
    m2[ 1] = 0.0f; m2[ 5] = 1.0f; m2[ 9] = 0.0f; m2[13] = 0.0f;
    m2[ 2] = 0.0f; m2[ 6] = 0.0f; m2[10] = 1.0f; m2[14] = 0.0f;
    m2[ 3] = 0.0f; m2[ 7] = 0.0f; m2[11] = 0.0f; m2[15] = 1.0f;

    r0[ 0] = 0.0f; r0[ 4] = 0.0f; r0[ 8] = 0.0f; r0[12] = 0.0f;
    r0[ 1] = 0.0f; r0[ 5] = 0.0f; r0[ 9] = 0.0f; r0[13] = 0.0f;
    r0[ 2] = 0.0f; r0[ 6] = 0.0f; r0[10] = 0.0f; r0[14] = 0.0f;
    r0[ 3] = 0.0f; r0[ 7] = 0.0f; r0[11] = 0.0f; r0[15] = 0.0f;

    Matrix4x4_Multiply_SSE(r0, m1, m2);
    Matrix4x4_Multiply_SSE(r0, m1, m2);

    return (0);
}

后Pratically第二的 MOVUPS 的,堆栈改变的结果的值(存储在堆栈上),并存储的 XMM0 的值上存储在修改(错误)地址的结果

Pratically after the second movups, the stack changes the result value (stored on the stack), and stores the values of xmm0 on the modified (and wrong) address stored in result.

从* Matrix4x4_Multiply_SSE *已经走出后,原来的内存不会被修改。

After having stepped out from *Matrix4x4_Multiply_SSE*, the original memory isn't modified.

我缺少什么?

推荐答案

您装配有缺陷。有间

void DoSomething(int *x)
{
    __asm
    {
        mov x[0], 10   // wrong
            mov [x], 10    // also wrong
        mov esi,x      // first get address
        mov [esi],500  // then assign - correct
    }
}

前两个例子没有写到存储器指向的位置的指针,但到指针本身的存储位置。由于参数来自堆栈你是用MOVUPS说明你的筹码覆盖。您可以在调试器窗口中看到这个,当你调用如

The first two examples did not write to the memory location pointed to the pointer but to the storage location of the pointer itself. Since the parameter comes from the stack you did overwrite with the movups instruction your stack. You can see this in the debugger window when you call e.g.

int x=0;
DoSomething(&x);

用MOV [X],10你不设置X要10,但你写到堆栈中。

With mov [x],10 you do not set x to 10 but you write into your stack.

这篇关于调用本地code。与手写汇编的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆