无法在图形数据库和动作书中的neo4j中复制/验证性能要求 [英] can't reproduce/verify the performance claims in graph databases and neo4j in action books

查看:69
本文介绍了无法在图形数据库和动作书中的neo4j中复制/验证性能要求的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新我提出了一个后续问题,其中包含更新后的脚本和更清晰的 neo4j性能比较设置到mysql(如何进行改进?).请继续./更新

UPDATE I have put up a follow up question that contains updated scripts and and a clearer setup on neo4j performance compared to mysql (how can it be improved?). Please continue there./UPDATE

在验证图形数据库"书(第20页)和neo4j(第1章)中的性能要求时,我遇到了一些问题.

I have some problems verifying the performance claims made in the "graph databases" book (page 20) and in the neo4j (chapter 1).

为验证这些说法,我创建了一个样本数据集,其中包含100000个人"条目,每个条目有50个朋友",并尝试查询例如朋友4跳远.我在mysql中使用了完全相同的数据集.与朋友的朋友进行4跳以上 mysql在0.93秒内返回,而 neo4j需要65 -75秒(在重复呼叫时).

To verify these claims I created a sample dataset of 100000 'person' entries with 50 'friends' each, and tried to query for e.g. friends 4 hops away. I used the very same dataset in mysql. With friends of friends over 4 hops mysql returns in 0.93 secs, while neo4j needs 65 -75 secs (on repeated calls).

我如何才能改善这一悲惨的结果,并验证书中提出的主张?

我使用ubuntu12.04 64bit,java版本"1.7.0_25"和mysql 5.5.31,neo4j-community-2.0.0-M03在具有16GB Ram的i5-3570K上运行整个设置结果为1.9)

I run the whole setup on a i5-3570K with 16GB Ram, using ubuntu12.04 64bit, java version "1.7.0_25" and mysql 5.5.31, neo4j-community-2.0.0-M03 (I get a similar outcome with 1.9)

所有代码/示例数据都可以在> https://github.com/jhb/neo4j上找到-experiements/(与2.0.0一起使用).可以在 https://github.com/jhb/neo4j-testdata上找到不同格式的结果样本数据.

All code/sample data can be found on https://github.com/jhb/neo4j-experiements/ (to be used with 2.0.0). The resulting sample data in different formats can be found on https://github.com/jhb/neo4j-testdata.

要使用脚本,您需要一个安装了mysql-python,请求和simplejson的python.

To use the scripts you need a python with mysql-python, requests and simplejson installed.

  • 该数据集是使用friendsdata.py创建的,并存储到friends.pickle
  • friends.pickle使用import_friends_neo4j.py导入到neo4j
  • friends.pickle使用import_friends_mysql.py导入到mysql
  • 我在mysql中的t_user_friend.*上添加索引
  • 我在neo4j的"node(noscenda_name)上添加了创建索引"

使朋友们的生活更轻松.*.bz2包含sql和cypher语句,用于在mysql和neo4j 2.0 M3中创建这些数据集.

To make life a bit easier the friends.*.bz2 contain sql and cypher statements to create those datasets in mysql and neo4j 2.0 M3.

我首先通过查询来预热mysql:

I first warm mysql up by querying:

select count(distinct name) from t_user;
select count(distinct name) from t_user;

然后,为了真正的满足我

Then, for the real meassurment I do

python query_friends_mysql.py 4 10

这将创建以下sql语句(更改t_user.names):

This creates the following sql statement (with changing t_user.names):

select 
    count(*)
from
    t_user,
    t_user_friend as uf1, 
    t_user_friend as uf2, 
    t_user_friend as uf3, 
    t_user_friend as uf4
where
    t_user.name='person8601' and 
    t_user.id = uf1.user_1 and
    uf1.user_2 = uf2.user_1 and
    uf2.user_2 = uf3.user_1 and
    uf3.user_2 = uf4.user_1;

,并重复此4跳查询10次.每个查询大约需要0.95秒. Mysql配置为使用4G的key_buffer.

and repeats this 4 hop query 10 times. The queries need around 0.95 secs each. Mysql is configured to use a key_buffer of 4G.

我修改了neo4j.properties:

I have modified neo4j.properties:

neostore.nodestore.db.mapped_memory=25M
neostore.relationshipstore.db.mapped_memory=250M

和neo4j-wrapper.conf:

and the neo4j-wrapper.conf:

wrapper.java.initmemory=2048
wrapper.java.maxmemory=8192

我要热身neo4j

start n=node(*) return count(n.noscenda_name);
start r=relationship(*) return count(r);

然后,我开始使用事务性HTTP终结点(但使用neo4j-shell会得到相同的结果).

Then I start using the transactional http endpoint (but I get the same results using the neo4j-shell).

我还是在热身

./bin/python query_friends_neo4j.py 3 10

这将创建以下形式的查询(具有不同的人员ID):

This creates a query of the form (with varying person ids):

{"statement": "match n:node-[r*3..3]->m:node where n.noscenda_name={target} return count(r);", "parameters": {"target": "person3089"}

在第7次通话后,每个通话大约需要0.7-0.8秒.

after the 7th call or so each call needs around 0.7-0.8 secs.

现在我要做真实的事情(4跳)

Now for the real thing (4 hops) I do

./bin/python query_friends_neo4j.py 4 10

创建

{"statement": "match n:node-[r*4..4]->m:node where n.noscenda_name={target} return count(r);", "parameters": {"target": "person3089"}

每个呼叫需要65到75秒的时间.

and each call takes between 65 and 75 secs.

我真的希望看到书中的说法是可重复的和正确的,并且neo4j比mysql快,而不是幅度慢.

I'd really like see the claims in the books to be reproducable and correct, and neo4j faster then mysql instead of magnitudes slower.

但是我不知道我在做什么错...:-(

But I don't know what I am doing wrong... :-(

所以,我最大的希望是:

So, my big hopes are:

  • 我没有正确设置neo4j的内存设置
  • 我用于neo4j的查询是完全错误的

任何欢迎加快neo4j速度的建议.

Any suggestions to get neo4j up to speed are highly welcome.

非常感谢

Joerg

推荐答案

2.0尚未对性能进行任何优化,因此应使用1.9.2进行比较. (如果使用2.0,是否为n.noscenda_name创建了索引)

2.0 has not been performance optimized at all, so you should use 1.9.2 for comparison. (if you use 2.0 - did you create an index for n.noscenda_name)

您可以使用profile start ...检查查询计划.

You can check the query plan with profile start ....

对于1.9,请为noscenda_name使用手动索引或node_auto_index.

With 1.9 please use a manual index or node_auto_index for noscenda_name.

您可以尝试以下查询吗?

Can you try these queries:

start n=node:node_auto_index(noscenda_name={target})
match n-->()-->()-->m
return count(*);

全文索引也比精确索引贵,因此请为noscenda_name保留exact自动索引.

Fulltext indexes are also more expensive than exact indexes, so keep the exact auto-index for noscenda_name.

无法运行您的导入器,它有时会失败,也许您可​​以共享已完成的neo4j数据库

can't get your importer to run, it fails at some point, perhaps you can share the finished neo4j database

python importer.py
reading rels
reading nodes
delete old
Traceback (most recent call last):
  File "importer.py", line 9, in <module>
    g.query('match n-[r]->m delete r;')
  File "/Users/mh/java/neo/neo4j-experiements/neo4jconnector.py", line 99, in query
    return self.call(payload)
  File "/Users/mh/java/neo/neo4j-experiements/neo4jconnector.py", line 71, in call
    self.transactionurl = result.headers['location']
  File "/Library/Python/2.7/site-packages/requests-1.2.3-py2.7.egg/requests/structures.py", line 77, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'location'

这篇关于无法在图形数据库和动作书中的neo4j中复制/验证性能要求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆