了解Neo4j对象缓存 [英] Understanding of Neo4j object cache

查看:214
本文介绍了了解Neo4j对象缓存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过一些调查了解Neo4j对象缓存。我对Object缓存的第一印象来自此链接中的幻灯片:
http://www.slideshare.net/thobe/an-overview-of-neo4j-internals

I'm trying to understand Neo4j object cache by some investigation into it. My first impression of Object cache come from the slides in this link: http://www.slideshare.net/thobe/an-overview-of-neo4j-internals

特别是缓存中的节点/关系对象应该看起来像幻灯片9或15/42。为了验证这一点,我使用现有的图形数据库内容编写了一个简单的服我这样做的方法是尝试使用sun.misc.Unsafe查看节点/关系对象的起始虚拟地址。获取虚拟地址的程序来自以下链接:
如何获取java中对象的内存位置?

Specifically the Node/Relationship object in cache should look like slide 9 or 15/42. To verify this, I wrote a simple server script using existing graph database contents. The way I do it is trying to look into the starting virtual address of the node/relationship object using sun.misc.Unsafe. The program for obtaining virtual address is from the following link: How can I get the memory location of a object in java?

public static long addressOf(Object o) throws Exception {
    Object[] array = new Object[] { o };

    long baseOffset = unsafe.arrayBaseOffset(Object[].class);
    int addressSize = unsafe.addressSize();
    long objectAddress;
    switch (addressSize) {
    case 4:
        objectAddress = unsafe.getInt(array, baseOffset);
        break;
    case 8:
        objectAddress = unsafe.getLong(array, baseOffset);
        break;
    default:
        throw new Error("unsupported address size: " + addressSize);
    }
    return (objectAddress);
}

在neo4j服务器脚本(My main()类)中,我得到了节点地址按id并按以下方式打印出地址:

And in the neo4j server script (My main() class), I get node address by id and print out the address in the following way:

void checkAddr(){
    nodeAddr(0);
    nodeAddr(1);
    nodeAddr(2);
}

void nodeAddr(int n){
    Node oneNode = graphDb.getNodeById(n);
    Node[] array1 = {oneNode};

    try {
        long address = UnsafeUtil.addressOf(array1);
        System.out.println("Addess: " + address);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

首先,我尝试使用Soft cache provider,这是默认情况。节点对象0,1和2打印出的地址是:

To begin with, I tried with Soft cache provider, which is the default case. The addresses get printed out for node object 0, 1 and 2 is:


Addess:4168500044
Addess:4168502383
Addess:4168502753

Addess: 4168500044 Addess: 4168502383 Addess: 4168502753

因此,使用第二个地址 - 第一个地址和第三个地址 - 第二个地址,我可以准确知道多少空间节点正在服用。在这种情况下,第一个节点对象占用2339B,第二个占用370B。

Therefore, Using second address - first address and third address - second address, I can know exactly how much space a node is taking. In this case, first node object takes 2339B and second take 370B.

然后,为了看到禁用对象缓存的影响,我使用NoCacheProvider进行设置:

Then, to see the impact of disabling object cache, I does the setting with NoCacheProvider:


setConfig(GraphDatabaseSettings.cache_type,NoCacheProvider.NAME)

setConfig(GraphDatabaseSettings.cache_type,NoCacheProvider.NAME)

打印出来的地址是:


Addess:4168488391
Addess:4168490708
Addess:4168491056

Addess: 4168488391 Addess: 4168490708 Addess: 4168491056

与第一种情况类似地计算的偏移量为:第一个节点对象占用2317B,第二个占用348B。

The offset, calculated similarly as in first case is: first node object takes 2317B and second takes 348B.

我的问题出现了:


  1. 因为我使用相同的图表并进行只读查询,为什么同一节点对象的大小会发生变化?

  1. Since I'm using the same graph and doing read only queries, why is the size of the same node object changing?

当我禁用对象缓存时,为什么地址偏移看起来与有对象一样缓存存在?例如,在节点存储文件中,单个节点占用9个字节,而在我的实验中并非如此。如果我获取节点对象的方式有问题,我怎样才能以正确的方式获取虚拟地址?有什么方法我可以具体知道mmap节点文件在内存中的位置吗?

When I disabled the object cache, why is the address offset look the same as if there is object cache exists? For example, in the node store file, a single node takes 9 bytes, which is not the case in my experiment. If the way I'm getting node object is problematic, how can I obtain virtual address in a correct way? And is there any way I can know specifically where does the mmap node file resides in memory?

我怎么能准确知道节点对象中存储的内容。当我在这个链接上查看Node.class时:
https://github.com/neo4j/neo4j/blob/1.9.8/community/kernel/src/main/java/org/neo4j/graphdb/Node。 java
似乎节点对象的外观与演示幻灯片中的相同。而只是节点对象使用的一组函数。还有一个节点对象在无对象缓存和对象缓存场合中一次性整合到内存中?

How could I know exactly what is stored in a node object. When I looked at Node.class at this link: https://github.com/neo4j/neo4j/blob/1.9.8/community/kernel/src/main/java/org/neo4j/graphdb/Node.java It doesn't seem that a node object should look the same way as it is in the presentation slides. Rather just a group of functions used by node object. Further is a node object brought into memory as a whole at once in both no-object-cache and with-object-cache occasion?


推荐答案

节点对象不是Neo4j在对象缓存中存储的对象,因此您不会通过查看这些实例来深入了解Neo4j的缓存。 Neo4j为您提供的 Node 的实现是一个名为 NodeProxy ,以及尽可能小(两个字段:内部id和对数据库的引用)。这些只是用作执行数据库中该节点周围操作的节点的句柄。存储在对象缓存中的对象是名为 NodeImpl (尽管名称没有实现节点接口)。 NodeImpl 对象具有在该演示文稿中的第15张幻灯片(幻灯片中的页码9)上列出的形状。好吧,它大致有这种形状,自从我制作这些幻灯片后,Neo4j已经发展。

The Node object is not what Neo4j stores in the "object cache", so you are not going to gain much insight into the caching of Neo4j by looking at those instances. The implementations of Node that Neo4j gives you are instances of a class called NodeProxy, and are as small as they can possibly be (two fields: internal id and reference to the database). These just serve as your handle of the node for performing operations around that node in the database. The objects stored in the "object cache" are instances of a class called NodeImpl (and despite the name they do not implement the Node interface). The NodeImpl objects have the shape that's outlined on the 15th slide (with page number 9 within the slide) in that presentation. Well, it roughly has that shape, Neo4j has evolved since I made those slides.

Neo4j的演变也改变了节点记录在磁盘上占用的字节数。 Neo4j 2.0及更高版本的节点记录略大于这些幻灯片。如果您有兴趣查看这些记录的布局,您应该查看 NodeRecord class,然后从 NodeStore 类并向下进入其依赖项以查找内存映射。

Neo4j evolving has also changed the number of bytes that node records occupy on disk. Neo4j 2.0 and later have slightly larger node records than what those slides present. If you are interested in looking at the layout of those records, you should look at the NodeRecord class, then start from NodeStore class and "downwards" into its dependencies to find the memory mapping.

除了查找在错误的对象中看到Neo4j中不同缓存方法之间的差异,您的测量方法存在缺陷。比较对象的地址并不能告诉您有关这些对象大小的任何信息。 JVM不保证一个接一个地(按时间)分配的两个对象将相邻地驻留在内存中,即使JVM确实使用了这样的分配策略,Neo4j也可能在两个对象的分配之间分配了多个对象正在比较。然后是垃圾收集器,它可能在您获取一个对象的地址和获取下一个对象的地址之间移动了对象。因此,查看Java中对象的地址几乎从来没有用过任何东西。有关在Java中测量对象大小的更好方法,请查看 Java对象布局实用程序,或使用 Instrumentation.getObjectSize(...)方法

Besides looking at the wrong object for seeing the difference between different cache approaches in Neo4j your approach of measuring is flawed. Comparing the addresses of objects does not tell you anything about the size of those objects. The JVM makes no guarantees that two objects allocated one after the other (in time) will reside adjacently in memory, and even if the JVM did utilise such an allocation policy, Neo4j might have allocated multiple objects in between the allocations of the two objects you are comparing. Then there is the garbage collector, which might have moved the objects around in between you getting the address of one object and you getting the address of the next object. Thus looking at the addresses of objects in Java is pretty much never useful for anything. For a better approach at measuring the size of an object in Java, take a look at the Java Object Layout utility, or use the Instrumentation.getObjectSize(...) method from a Java agent.

如上所述回答你的问题:

To answer you questions as stated:


  1. 节点对象的大小没有变化,它们的地址不能保证在两次运行之间是相同的。根据我上面的描述,您不能依赖对象地址来计算对象大小。

  1. The sizes of the node objects are not changing, their addresses are not guaranteed to be the same in between runs. As per my description above you cannot rely on object address to compute object size.

因为您正在查看 NodeProxy 对象看起来都是一样的。为了查看 NodeImpl 对象,你必须深入挖掘Neo4j的内部结构。由于看起来你正在使用Neo4j 1.9,你可以将 GraphDatabaseService 实例转换为 GraphDatabaseAPI (一个接口,是实现的内部)然后在该对象上调用 getNodeManager()方法。从 NodeManager 您可以调用 getNodeIfCached(node.getId())来获取 NodeImpl 对象。请注意,此API在Neo4j版本之间不兼容,并且使用它是保修无效,如果密封破坏的情况之一......

Since you are looking at NodeProxy objects they will look the same regardless of what caching strategy Neo4j uses. In order to look at the NodeImpl objects you have to dig quite deep into the internals of Neo4j. Since it looks like you are using Neo4j 1.9 you would cast the GraphDatabaseService instance that you have to GraphDatabaseAPI (an interface that is internal to the implementation) then invoke the getNodeManager() method on that object. from the NodeManager you can call getNodeIfCached( node.getId() ) to get a NodeImpl object. Please note that this API will not be compatible between versions of Neo4j, and using it is one of those "warranty void if seal broken" kind of situations...

请查看 NodeImpl 的源代码。至于数据何时以及如何进入缓存,Neo4j试图对此懒惰,只加载您使用的数据。如果要获取节点的关系,那么这些关系将被加载到缓存中,如果要获取属性,则会将这些属性加载到缓存中。如果只获得关系,则永远不会加载属性,反之亦然。

Look at the source code for NodeImpl instead. As to when and how data is brought into cache, Neo4j tries to be lazy about that, only loading the data you use. If you are getting the relationships of a node, those will be loaded into the cache, and if you are getting properties, those will be loaded into the cache. If you only get relationships, the properties will never be loaded and vice versa.

这篇关于了解Neo4j对象缓存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆