将可蓝变结构复制到非托管内存位置的最快方法(IntPtr) [英] Fastest way to copy a blittable struct to an unmanaged memory location (IntPtr)

查看:106
本文介绍了将可蓝变结构复制到非托管内存位置的最快方法(IntPtr)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我具有类似于以下功能:

I have a function similar to the following:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void SetVariable<T>(T newValue) where T : struct {
    // I know by this point that T is blittable (i.e. only unmanaged value types)

    // varPtr is a void*, and is where I want to copy newValue to
    *varPtr = newValue; // This won't work, but is basically what I want to do
}

我看到了Marshal.StructureToIntPtr(),但是它看起来很慢,而且这是对性能敏感的代码.如果我知道类型T,我可以将varPtr声明为T*,但是...嗯,我不知道.

I saw Marshal.StructureToIntPtr(), but it seems quite slow, and this is performance-sensitive code. If I knew the type T I could just declare varPtr as a T*, but... Well, I don't.

无论哪种方式,我都追求最快的方式.不必担心安全性":到代码的这一点,我知道结构T的大小将完全适合varPtr所指向的内存.

Either way, I'm after the fastest possible way to do this. 'Safety' is not a concern: By this point in the code, I know that the size of the struct T will fit exactly in to the memory pointed to by varPtr.

推荐答案

一个答案是在C#中重新实现本机memcpy,并利用本机memcpy尝试执行的相同优化技巧.您可以看到Microsoft在自己的源代码中执行此操作.请参阅Microsoft参考源中的 Buffer.cs 文件:

One answer is to reimplement native memcpy instead in C#, making use of the same optimizing tricks that native memcpy attempts to do. You can see Microsoft doing this in their own source. See the Buffer.cs file in the Microsoft Reference Source:

     // This is tricky to get right AND fast, so lets make it useful for the whole Fx.
     // E.g. System.Runtime.WindowsRuntime!WindowsRuntimeBufferExtensions.MemCopy uses it.
     internal unsafe static void Memcpy(byte* dest, byte* src, int len) {

        // This is portable version of memcpy. It mirrors what the hand optimized assembly versions of memcpy typically do.
        // Ideally, we would just use the cpblk IL instruction here. Unfortunately, cpblk IL instruction is not as efficient as
        // possible yet and so we have this implementation here for now.

        switch (len)
        {
        case 0:
            return;
        case 1:
            *dest = *src;
            return;
        case 2:
            *(short *)dest = *(short *)src;
            return;
        case 3:
            *(short *)dest = *(short *)src;
            *(dest + 2) = *(src + 2);
            return;
        case 4:
            *(int *)dest = *(int *)src;
            return;
        ...

有趣的是,它们对所有大小为512的本地文件都实现了memcpy.大多数大小使用指针别名技巧来使VM发出对不同大小进行操作的指令.他们只有在512时才开始调用本机memcpy:

Its interesting to note that they natively implement memcpy for all sizes up to 512; most of the sizes use pointer aliasing tricks to get the VM to emit instructions that operate on differing sizes. Only at 512 do they finally drop into invoking the native memcpy:

        // P/Invoke into the native version for large lengths
        if (len >= 512)
        {
            _Memcpy(dest, src, len);
            return;
        }

大概是本机memcpy更快,因为可以手动优化它以使用SSE/MMX指令执行复制.

Presumably, native memcpy is even faster since it can be hand optimized to use SSE/MMX instructions to perform the copy.

这篇关于将可蓝变结构复制到非托管内存位置的最快方法(IntPtr)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆