在托管代码中,我该如何实现良好的参考地方? [英] In managed code, how do I achieve good locality of reference?

查看:138
本文介绍了在托管代码中,我该如何实现良好的参考地方?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于RAM似乎新盘,并自该声明也意味着这对内存的访问,现在认为如何磁盘访问一直是,我确实想最大限度地引用的局部性内存高性能应用同样缓慢。例如,在一个排序的指数,我想相邻值接近(不像比如说,在一个哈希表),我想该指数指向的数据附近,太。

Since RAM seems to be the new disk, and since that statement also means that access to memory is now considered slow similarly to how disk access has always been, I do want to maximize locality of reference in memory for high performance applications. For example, in a sorted index, I want adjacent values to be close (unlike say, in a hashtable), and I want the data the index is pointing to close by, too.

在C,我能掀起一个数据结构有专门的内存管理器,像(非常复杂)朱迪阵列的开发商做到了。随着对指针的直接控制,他们甚至竟编码的指针值本身的附加信息。在Python,Java或C#中工作时,我故意一个(或多个)(S)抽象层次远离这种类型的解决方案,我委托的JIT编译器,并在较低水平做聪明的把戏对我来说最优化的运行时间

In C, I can whip up a data structure with a specialized memory manager, like the developers of the (immensely complex) Judy array did. With direct control over the pointers, they even went so far as to encode additional information in the pointer value itself. When working in Python, Java or C#, I am deliberately one (or more) level(s) of abstraction away from this type of solution and I'm entrusting the JIT compilers and optimizing runtimes with doing clever tricks on the low levels for me.

不过,我想,即使在这个高层次的抽象,有可语义上考虑事情的偷心,因此有可能是的实际上的在较低水平接近。例如,我想知道以下(括号我的猜测):

Still, I guess, even at this high level of abstraction, there are things that can be semantically considered "closer" and therefore are likely to be actually closer at the low levels. For example, I was wondering about the following (my guess in parentheses):


  • 我可以期待一个数组是内存的相邻块(是)?

  • 是同一个实例两个整数两个比较接近的比在同一类(可能)?

  • 的不同实例是否一个对象在内存占用contigous区(不)?

  • 什么是只有两个 INT 字段和一个单一的对象数组之间的区别有两个 INT [] 字段对象呢? (这个例子可能是特定于Java)

  • Can I expect an array to be an adjacent block of memory (yes)?
  • Are two integers in the same instance closer than two in different instances of the same class (probably)?
  • Does an object occupy a contigous region in memory (no)?
  • What's the difference between an array of objects with only two int fields and a single object with two int[] fields? (this example is probably Java specific)

我开始在Java方面想了解这些,但我的疑惑变得更普遍,所以我建议不要把这个作为一个Java的问题。

I started wondering about these in a Java context, but my wondering has become more general, so I'd suggest to not treat this as a Java question.

推荐答案


  • 在.NET中,元素阵列的肯定连续的。在Java中,我期望他们在大多数实现,但它似乎没有得到保障。

  • 我认为这是合理的假设用于通过的内存字段的一个实例是在一个单一的块......但不要忘了,一些这些领域可能是其他对象的引用。

  • In .NET, elements of an array are certainly contiguous. In Java I'd expect them to be in most implementations, but it appears not to be guaranteed.
  • I think it's reasonable to assume that the memory used by an instance for fields is in a single block... but don't forget that some of those fields may be references to other objects.
  • 对于Java数组的一部分, Sun的JNI文档包括这评论,在关于字符串的讨论局促:

    For the Java array part, Sun's JNI documentation includes this comment, tucked away in a discussion about strings:

    例如,Java虚拟机可以不存储阵列连续

    For example, the Java virtual machine may not store arrays contiguously.

    有关你的最后一个问题,如果你有两个 INT [] 则每这些阵列会一个连续的内存块,但他们可能会很相距甚远内存。如果你有两个int字段对象的数组,然后每个对象可以彼此很长的路要走,但每个对象中的两个整数将接近在一起。可能更重要的是,你最终会走的很多的更多的内存用大量的对象的解决方案,由于每个对象的开销。在.NET中,你可以使用自定义的结构的有两个整数,而是和有那些阵列 - 这将保留所有的数据在一个大的块

    For your last question, if you have two int[] then each of those arrays will be a contiguous block of memory, but they could be very "far apart" in memory. If you have an array of objects with two int fields, then each object could be a long way from each other, but the two integers within each object will be close together. Potentially more importantly, you'll end up taking a lot more memory with the "lots of objects" solution due to the per-object overhead. In .NET you could use a custom struct with two integers instead, and have an array of those - that would keep all the data in one big block.

    我相信,在Java和.NET,如果你在一个线程中分配了很多快速连续短小的对象,然后这些对象的可能的有参考良好的局部性。当GC压实堆,这可能会改善 - 也可能潜在地变得更糟,如果以

    I believe that in both Java and .NET, if you allocate a lot of smallish objects in quick succession within a single thread then those objects are likely to have good locality of reference. When the GC compacts a heap, this may improve - or it may potentially become worse, if a heap with

    A B C D E
    

    被压缩到

    A D E B
    

    (其中收集C) - 突然A和b,这可能是关闭之前,相距甚远。我不知道这是否实际上在任何垃圾收集情况(有负载左右!),但它是可能的。

    (where C is collected) - suddenly A and B, which may have been "close" before, are far apart. I don't know whether this actually happens in any garbage collector (there are loads around!) but it's possible.

    在你通常不会有托管环境,基本上如超过局部性多的控制,你在一个非托管环境中做 - 你必须相信,托管环境是管理它足够好,那你会被编码到一个更高的层次的平台,让你花保存足够的时间时间在别处进行优化。

    Basically in a managed environment you don't usually have as much control over locality of reference as you do in an unmanaged environment - you have to trust that the managed environment is sufficiently good at managing it, and that you'll have saved enough time by coding to a higher level platform to let you spend time optimising elsewhere.

    这篇关于在托管代码中,我该如何实现良好的参考地方?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆