.NET 数组的开销? [英] Overhead of a .NET array?

查看:19
本文介绍了.NET 数组的开销?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用以下代码确定 .NET 数组(在 32 位进程中)头的开销:

I was trying to determine the overhead of the header on a .NET array (in a 32-bit process) using this code:

long bytes1 = GC.GetTotalMemory(false);
object[] array = new object[10000];
    for (int i = 0; i < 10000; i++)
        array[i] = new int[1];
long bytes2 = GC.GetTotalMemory(false);
array[0] = null; // ensure no garbage collection before this point

Console.WriteLine(bytes2 - bytes1);
// Calculate array overhead in bytes by subtracting the size of 
// the array elements (40000 for object[10000] and 4 for each 
// array), and dividing by the number of arrays (10001)
Console.WriteLine("Array overhead: {0:0.000}", 
                  ((double)(bytes2 - bytes1) - 40000) / 10001 - 4);
Console.Write("Press any key to continue...");
Console.ReadKey();

结果是

    204800
    Array overhead: 12.478

在 32 位进程中,object[1] 应该和 int[1] 一样大,但实际上开销增加了 3.28 个字节

In a 32-bit process, object[1] should be the same size as int[1], but in fact the overhead jumps by 3.28 bytes to

    237568
    Array overhead: 15.755

有人知道为什么吗?

(顺便说一句,如果有人好奇,非数组对象的开销,例如上面循环中的 (object)i,大约为 8 个字节(8.384).我听说在 64 位进程中是 16 个字节.)

(By the way, if anyone's curious, the overhead for non-array objects, e.g. (object)i in the loop above, is about 8 bytes (8.384). I heard it's 16 bytes in 64-bit processes.)

推荐答案

这里有一个稍微简洁的 (IMO) 简短但完整的程序来演示相同的事情:

Here's a slightly neater (IMO) short but complete program to demonstrate the same thing:

using System;

class Test
{
    const int Size = 100000;

    static void Main()
    {
        object[] array = new object[Size];
        long initialMemory = GC.GetTotalMemory(true);
        for (int i = 0; i < Size; i++)
        {
            array[i] = new string[0];
        }
        long finalMemory = GC.GetTotalMemory(true);
        GC.KeepAlive(array);

        long total = finalMemory - initialMemory;

        Console.WriteLine("Size of each element: {0:0.000} bytes",
                          ((double)total) / Size);
    }
}

但我得到了相同的结果——任何引用类型数组的开销都是 16 字节,而任何值类型数组的开销都是 12 字节.在 CLI 规范的帮助下,我仍在努力弄清楚为什么会这样.不要忘记引用类型数组是协变的,这可能是相关的...

But I get the same results - the overhead for any reference type array is 16 bytes, whereas the overhead for any value type array is 12 bytes. I'm still trying to work out why that is, with the help of the CLI spec. Don't forget that reference type arrays are covariant, which may be relevant...

在cordbg 的帮助下,我可以确认Brian 的回答——无论实际元素类型如何,引用类型数组的类型指针都是相同的.据推测,object.GetType()(这是非虚拟的,请记住)中有一些奇怪的地方来解释这一点.

With the help of cordbg, I can confirm Brian's answer - the type pointer of a reference-type array is the same regardless of the actual element type. Presumably there's some funkiness in object.GetType() (which is non-virtual, remember) to account for this.

所以,代码为:

object[] x = new object[1];
string[] y = new string[1];
int[] z = new int[1];
z[0] = 0x12345678;
lock(z) {}

我们最终得到如下结果:

We end up with something like the following:

Variables:
x=(0x1f228c8) <System.Object[]>
y=(0x1f228dc) <System.String[]>
z=(0x1f228f0) <System.Int32[]>

Memory:
0x1f228c4: 00000000 003284dc 00000001 00326d54 00000000 // Data for x
0x1f228d8: 00000000 003284dc 00000001 00329134 00000000 // Data for y
0x1f228ec: 00000000 00d443fc 00000001 12345678 // Data for z

请注意,我已经在变量本身的值之前转储了内存 1 个字.

Note that I've dumped the memory 1 word before the value of the variable itself.

对于 xy,值为:

  • 同步块,用于锁定哈希码(或细锁 - 参见 Brian 的评论)
  • 类型指针
  • 数组大小
  • 元素类型指针
  • 空引用(第一个元素)
  • The sync block, used for locking the hash code (or a thin lock - see Brian's comment)
  • Type pointer
  • Size of array
  • Element type pointer
  • Null reference (first element)

对于 z,值为:

  • 同步块
  • 类型指针
  • 数组大小
  • 0x12345678(第一个元素)

不同的值类型数组(byte[]、int[] 等)以不同的类型指针结束,而所有引用类型数组使用相同的类型指针,但具有不同的元素类型指针.元素类型指针与您在该类型对象的类型指针中找到的值相同.因此,如果我们在上面的运行中查看字符串对象的内存,它将具有 0x00329134 的类型指针.

Different value type arrays (byte[], int[] etc) end up with different type pointers, whereas all reference type arrays use the same type pointer, but have a different element type pointer. The element type pointer is the same value as you'd find as the type pointer for an object of that type. So if we looked at a string object's memory in the above run, it would have a type pointer of 0x00329134.

类型指针之前的单词肯定与监视器或哈希码有些有关:调用 GetHashCode() 会填充那一点内存,我相信默认的 object.GetHashCode() 获取同步块以确保对象生命周期内哈希码的唯一性.然而,仅仅做lock(x){} 并没有做任何事情,这让我感到惊讶...

The word before the type pointer certainly has something to do with either the monitor or the hash code: calling GetHashCode() populates that bit of memory, and I believe the default object.GetHashCode() obtains a sync block to ensure hash code uniqueness for the lifetime of the object. However, just doing lock(x){} didn't do anything, which surprised me...

顺便说一下,所有这些只对向量"类型有效——在 CLR 中,向量"类型是一个下限为 0 的一维数组.其他数组会有不同的布局——一方面,他们需要存储下限...

All of this is only valid for "vector" types, by the way - in the CLR, a "vector" type is a single-dimensional array with a lower-bound of 0. Other arrays will have a different layout - for one thing, they'd need the lower bound stored...

到目前为止,这只是实验,但这是猜测 - 系统以现有方式实施的原因.从现在开始,我真的只是在猜测.

So far this has been experimentation, but here's the guesswork - the reason for the system being implemented the way it has. From here on, I really am just guessing.

  • 所有 object[] 数组可以共享相同的 JIT 代码.它们在内存分配、数组访问、Length 属性和(重要的)GC 引用布局方面的行为方式相同.将其与值类型数组进行比较,其中不同的值类型可能具有不同的 GC足迹"(例如,一个可能有一个字节然后是一个引用,其他人可能根本没有引用等).
  • 每次您在 object[] 中分配一个值时,运行时都需要检查它是否有效.它需要检查您用于新元素值的引用的对象的类型是否与数组的元素类型兼容.例如:

  • All object[] arrays can share the same JIT code. They're going to behave the same way in terms of memory allocation, array access, Length property and (importantly) the layout of references for the GC. Compare that with value type arrays, where different value types may have different GC "footprints" (e.g. one might have a byte and then a reference, others will have no references at all, etc).
  • Every time you assign a value within an object[] the runtime needs to check that it's valid. It needs to check that the type of the object whose reference you're using for the new element value is compatible with the element type of the array. For instance:

object[] x = new object[1];
object[] y = new string[1];
x[0] = new object(); // Valid
y[0] = new object(); // Invalid - will throw an exception

这就是我之前提到的协方差.现在考虑到 每一个分配都会发生这种情况,减少间接引用的数量是有意义的.特别是,我怀疑您真的不想通过为每个赋值访问类型对象来获取元素类型来破坏缓存.我怀疑(我的 x86 程序集不足以验证这一点)该测试类似于:

This is the covariance I mentioned earlier. Now given that this is going to happen for every single assignment, it makes sense to reduce the number of indirections. In particular, I suspect you don't really want to blow the cache by having to go to the type object for each assigment to get the element type. I suspect (and my x86 assembly isn't good enough to verify this) that the test is something like:

  • 要复制的值是否为空引用?如果是这样,那很好.(完成.)
  • 获取引用指向的对象的类型指针.
  • 那个类型指针和元素类型指针一样吗(简单的二元相等性检查)?如果是这样,那很好.(完成.)
  • 那个类型指针赋值与元素类型指针兼容吗?(更复杂的检查,涉及继承和接口.)如果是这样,那很好 - 否则,抛出异常.

如果我们可以在前三个步骤中终止搜索,则不会有太多的间接性——这对于像数组赋值那样经常发生的事情来说是件好事.对于值类型赋值,这一切都不需要发生,因为这是静态可验证的.

If we can terminate the search in the first three steps, there's not a lot of indirection - which is good for something that's going to happen as often as array assignments. None of this needs to happen for value type assignments, because that's statically verifiable.

所以,这就是为什么我认为引用类型数组比值类型数组稍大.

So, that's why I believe reference type arrays are slightly bigger than value type arrays.

好问题 - 深入研究它真的很有趣:)

Great question - really interesting to delve into it :)

这篇关于.NET 数组的开销?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆