无法在行动手册中重现/验证图形数据库和 neo4j 中的性能声明 [英] can't reproduce/verify the performance claims in graph databases and neo4j in action books

查看:18
本文介绍了无法在行动手册中重现/验证图形数据库和 neo4j 中的性能声明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

UPDATE 我提出了一个后续问题,其中包含更新的脚本和更清晰的设置,关于 neo4j 性能比较到 mysql(如何改进?).请继续./UPDATE

UPDATE I have put up a follow up question that contains updated scripts and and a clearer setup on neo4j performance compared to mysql (how can it be improved?). Please continue there./UPDATE

在验证图形数据库"一书(第 20 页)和 neo4j(第 1 章)中提出的性能声明时,我遇到了一些问题.

I have some problems verifying the performance claims made in the "graph databases" book (page 20) and in the neo4j (chapter 1).

为了验证这些声明,我创建了一个包含 100000 个人物"条目的样本数据集,每个条目包含 50 个朋友",并尝试查询例如朋友 4 跳.我在 mysql 中使用了相同的数据集.与朋友的朋友超过 4 跳mysql 在 0.93 秒内返回,而 neo4j 需要 65 -75 秒(重复调用).

To verify these claims I created a sample dataset of 100000 'person' entries with 50 'friends' each, and tried to query for e.g. friends 4 hops away. I used the very same dataset in mysql. With friends of friends over 4 hops mysql returns in 0.93 secs, while neo4j needs 65 -75 secs (on repeated calls).

我怎样才能改善这个悲惨的结果,并验证书中的说法?

我在带有 16GB Ram 的 i5-3570K 上运行整个设置,使用 ubuntu12.04 64 位,java 版本1.7.0_25"和 mysql 5.5.31,neo4j-community-2.0.0-M03(我得到了类似的结果与 1.9)

I run the whole setup on a i5-3570K with 16GB Ram, using ubuntu12.04 64bit, java version "1.7.0_25" and mysql 5.5.31, neo4j-community-2.0.0-M03 (I get a similar outcome with 1.9)

所有代码/示例数据都可以在 https://github.com/jhb/neo4j 上找到-experiments/(与 2.0.0 一起使用).可以在 https://github.com/jhb/neo4j-testdata 上找到不同格式的结果样本数据.

All code/sample data can be found on https://github.com/jhb/neo4j-experiements/ (to be used with 2.0.0). The resulting sample data in different formats can be found on https://github.com/jhb/neo4j-testdata.

要使用脚本,您需要一个安装了 mysql-python、requests 和 simplejson 的 python.

To use the scripts you need a python with mysql-python, requests and simplejson installed.

  • 数据集使用friendsdata.py创建并存储到friends.pickle
  • friends.pickle 使用 import_friends_neo4j.py 导入到 neo4j
  • friends.pickle 使用 import_friends_mysql.py 导入 mysql
  • 我在 mysql 中的 t_user_friend.* 上添加索引
  • 我在 neo4j 中添加了在 :node(noscenda_name) 上创建索引

为了让朋友们的生活更轻松.*.bz2 包含 sql 和 cypher 语句,用于在 mysql 和 neo4j 2.0 M3 中创建这些数据集.

To make life a bit easier the friends.*.bz2 contain sql and cypher statements to create those datasets in mysql and neo4j 2.0 M3.

我首先通过查询来预热mysql:

I first warm mysql up by querying:

select count(distinct name) from t_user;
select count(distinct name) from t_user;

然后,为了真正的测量

python query_friends_mysql.py 4 10

这将创建以下 sql 语句(更改 t_user.names):

This creates the following sql statement (with changing t_user.names):

select 
    count(*)
from
    t_user,
    t_user_friend as uf1, 
    t_user_friend as uf2, 
    t_user_friend as uf3, 
    t_user_friend as uf4
where
    t_user.name='person8601' and 
    t_user.id = uf1.user_1 and
    uf1.user_2 = uf2.user_1 and
    uf2.user_2 = uf3.user_1 and
    uf3.user_2 = uf4.user_1;

并重复这个 4 跳查询 10 次.每个查询需要大约 0.95 秒.Mysql配置为使用4G的key_buffer.

and repeats this 4 hop query 10 times. The queries need around 0.95 secs each. Mysql is configured to use a key_buffer of 4G.

我修改了neo4j.properties:

I have modified neo4j.properties:

neostore.nodestore.db.mapped_memory=25M
neostore.relationshipstore.db.mapped_memory=250M

和neo4j-wrapper.conf:

and the neo4j-wrapper.conf:

wrapper.java.initmemory=2048
wrapper.java.maxmemory=8192

为了预热 Neo4j,我会

To warm up neo4j I do

start n=node(*) return count(n.noscenda_name);
start r=relationship(*) return count(r);

然后我开始使用事务性 http 端点(但我使用 neo4j-shell 得到相同的结果).

Then I start using the transactional http endpoint (but I get the same results using the neo4j-shell).

还在热身,我跑了

./bin/python query_friends_neo4j.py 3 10

这将创建一个表单查询(具有不同的人员 ID):

This creates a query of the form (with varying person ids):

{"statement": "match n:node-[r*3..3]->m:node where n.noscenda_name={target} return count(r);", "parameters": {"target": "person3089"}

在第 7 个左右的通话之后,每个通话大约需要 0.7-0.8 秒.

after the 7th call or so each call needs around 0.7-0.8 secs.

现在是真实的东西(4 跳)

Now for the real thing (4 hops) I do

./bin/python query_friends_neo4j.py 4 10

创造

{"statement": "match n:node-[r*4..4]->m:node where n.noscenda_name={target} return count(r);", "parameters": {"target": "person3089"}

每次调用需要 65 到 75 秒.

and each call takes between 65 and 75 secs.

我真的很想看到书中的声明是可重现和正确的,并且 neo4j 比 mysql 快,而不是慢很多.

I'd really like see the claims in the books to be reproducable and correct, and neo4j faster then mysql instead of magnitudes slower.

但我不知道我做错了什么...:-(

But I don't know what I am doing wrong... :-(

所以,我最大的希望是:

So, my big hopes are:

  • 我没有正确地为 neo4j 进行内存设置
  • 我用于 neo4j 的查询完全错误

非常欢迎任何让 neo4j 加快速度的建议.

Any suggestions to get neo4j up to speed are highly welcome.

非常感谢,

约尔格

推荐答案

2.0 根本没有进行性能优化,因此您应该使用 1.9.2 进行比较.(如果您使用 2.0 - 您是否为 n.noscenda_name 创建了索引)

2.0 has not been performance optimized at all, so you should use 1.9.2 for comparison. (if you use 2.0 - did you create an index for n.noscenda_name)

您可以使用profile start ...检查查询计划.

You can check the query plan with profile start ....

对于 1.9,请为 noscenda_name 使用手动索引或 node_auto_index.

With 1.9 please use a manual index or node_auto_index for noscenda_name.

你能试试这些查询吗:

start n=node:node_auto_index(noscenda_name={target})
match n-->()-->()-->m
return count(*);

全文索引也比精确索引更昂贵,因此为 noscenda_name 保留 exact 自动索引.

Fulltext indexes are also more expensive than exact indexes, so keep the exact auto-index for noscenda_name.

无法让你的导入器运行,它在某个时候失败了,也许你可以分享完成的 neo4j 数据库

can't get your importer to run, it fails at some point, perhaps you can share the finished neo4j database

python importer.py
reading rels
reading nodes
delete old
Traceback (most recent call last):
  File "importer.py", line 9, in <module>
    g.query('match n-[r]->m delete r;')
  File "/Users/mh/java/neo/neo4j-experiements/neo4jconnector.py", line 99, in query
    return self.call(payload)
  File "/Users/mh/java/neo/neo4j-experiements/neo4jconnector.py", line 71, in call
    self.transactionurl = result.headers['location']
  File "/Library/Python/2.7/site-packages/requests-1.2.3-py2.7.egg/requests/structures.py", line 77, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'location'

这篇关于无法在行动手册中重现/验证图形数据库和 neo4j 中的性能声明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆