Neo4j慢吗?我一定做错了什么,请告诉我那是什么 [英] Neo4j slow? I must be doing something wrong, please tell me what it is

查看:141
本文介绍了Neo4j慢吗?我一定做错了什么,请告诉我那是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到嵌入式Neo4j产生了一些不太可能的性能结果,从表面上看,它的速度比预期的慢了几个数量级,因此我假设我做错了",尽管我没有做任何复杂的事情.

I'm seeing some rather improbable performance results from the embedded Neo4j, on the surface it's orders of magnitude slower than expected so I'm assuming I'm "doing it wrong", although I'm not doing anything complicated.

我正在使用Neo4j的最新嵌入式python绑定(https://github.com/neo4j/python-embedded)

I'm using the latest embedded python bindings for Neo4j (https://github.com/neo4j/python-embedded)

from neo4j import GraphDatabase
db = GraphDatabase('/tmp/neo4j')

我已经创建了具有简单属性的伪造1500种产品:

I've created fake 1500 products with simple attributes:

fake_products = [{'name':str(x)} for x in range(0,1500)]

...并从中创建了我连接到子引用节点的节点:

... and created nodes out of them that I connected to a subreference node:

with db.transaction:
    products = db.node()
    db.reference_node.PRODUCTS(products)

    for prod_def in fake_products:
        product = db.node(name=prod_def['name'])        
        product.INSTANCE_OF(products)

现在,对我来说,看起来和我在文档中看到的代码几乎完全一样:

Now with what looks, to me, as almost exactly the same kind of code I've seen in the documentation:

PRODUCTS = db.getNodeById(1) 
for x in PRODUCTS.INSTANCE_OF.incoming: 
    pass

...在Macbook Pro上遍历这1500个节点需要> 0.2s.什么. (我当然多次运行了该查询,因此至少在python绑定中,这与冷缓存无关)

... iterating through these 1500 nodes takes >0.2s on my Macbook Pro. WHAT. ( I of course ran this query a bunch of times so at least in the python bindings it's not a matter of cold caches)

我放大到15k,花了2s.我下载了Gremlin并发出了一个等效查询,以调查它是neo4j还是python绑定:

I amped it up to 15k, it took 2s. I downloaded Gremlin and issued an equivalent query to investigate if it's neo4j or the python bindings:

g.v(1).in("INSTANCE_OF")

..似乎在第一次尝试中花费了大约2秒,而在第二次尝试中,它似乎几乎立即完成了.

.. it seems it took about 2s on the first try, on the second run it seemed to complete almost immediately.

有人知道它为什么这么慢吗?我得到的结果对我来说一定是一个错误.

Any idea why it's so slow? The results I'm getting have got to be some kind of a mistake on my part.

推荐答案

这是Neo4j延迟加载数据并且不进行任何预取的过程.第一次运行时,您要击中磁盘,第二次运行时,缓存是热的,这是您的实际生产方案.

This is Neo4j loading data lazily and not doing any prefetching. On the first run, you are hitting the disk, on the second, the caches are warm, which is your real production scenario.

这篇关于Neo4j慢吗?我一定做错了什么,请告诉我那是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆