如何配置使neo4j更快? [英] How to configure to make neo4j faster?

查看:152
本文介绍了如何配置使neo4j更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用neo4j做一些关于SNS的实验.我创建了一个由100万个用户,10万个项目组成的随机图,每个用户有大约100个朋友和100个喜欢的项目.因此,图中约有100万个节点和2亿个关系,并且图文件占用4.8GB.所有节点都只有一个ID,我已经为其创建了索引. 现在,我已经使用Java API来建立一个小型集群来维护该图,该图由三个VM组成.每个VM都有 16GB ram,Intel Xeon CPU 2.00GHz(8核).下面是一些配置:

I try to use neo4j to do some experiment about SNS. I have created a random graph consisted of 1 million users, 100 thousand items, and each user has about 100 friends and 100 favourite items. So there are about 1 million nodes and 200 million relationships in the graph and the graph files take up 4.8GB. All nodes only have an id and I have created index for them. Now I have used Java APIs to set up a small cluster to maintain this graph, which is consisted of three VMs. Each VM has 16GB ram, Intel Xeon CPU 2.00GHz(8 cores). Below is some configuration:

config.put( "neostore.nodestore.db.mapped_memory", "150M");
config.put("neostore.relationshipstore.db.mapped_memory", "5G");
config.put( "neostore.propertystore.db.mapped_memory", "100M");
config.put( "neostore.propertystore.db.strings.mapped_memory", "130M");
config.put( "neostore.propertystore.db.arrays.mapped_memory", "130M");
config.put( "node_auto_indexing", "true");
config.put( "use_memory_mapped_buffers", "true");
config.put( "neostore.propertystore.db.index.keys.mapped_memory", "150M");
config.put( "neostore.propertystore.db.index.mapped_memory", "150M");

我使用gcr cache_type.我只是通过遍历来预热图:

I use the gcr cache_type. I simply warm up the graph by traversing:

for ( Node n : GlobalGraphOperations.at(db).getAllNodes() ) {
    n.getPropertyKeys();
    for ( Relationship relationship : n.getRelationships() ) {
        start = relationship.getStartNode();
    }
}

密码查询:

start user=node:users({key}={value}) match user-[:FRIEND]->(friend)-[:LIKES]->(item) return item, collect(friend), count(0) order by count(0) desc limit 32;

,这意味着找出朋友最喜欢的物品. 我使用以下命令运行jar:java -d64 -server -XX:+UseConcMarkSweepGC -XX:+UseNUMA -Xms10752m -Xmx10752m -Xmn2688m -jar Neo4J-1.0-SNAPSHOT.jar

,which means finding out one's friends' favourite items. I run the jar with the command: java -d64 -server -XX:+UseConcMarkSweepGC -XX:+UseNUMA -Xms10752m -Xmx10752m -Xmn2688m -jar Neo4J-1.0-SNAPSHOT.jar

现在,我的实验结果是: (1)单线程 每个查询平均花费约70毫秒. (2)8线程 每个查询平均花费约160毫秒,而许多查询则花费超过500毫秒. RPS约为50/秒.

Now, my experiment results: (1) single thread Each query costs about 70ms on average. (2) 8-thread Each query costs about 160ms on average, and many queries cost more than 500ms. The RPS is about 50/sec.

我想提高性能,但不知道如何.看来ram不足以保留所有数据,对吗? 此外,我尝试了软而强的cache_type,并且在预热时ram很快就装满了.

I want to improve the performance, but don't know how. It seems the ram is not enough to keep all the data, is that right? Besides, I' have tried the soft and strong cache_type, and the ram is full quickly when it's warming up.

请帮助我,并教我如何进行改进. 非常感谢.

Please help me and teach me how to improve it. Thanks a lot.

推荐答案

如果堆大小/可用RAM太小而无法将完整数据集保存在对象缓存中,则可以使用企业版.通过将负载均衡器放在n个Neo4j实例的前面,该实例将对图的特定部分的所有请求路由到同一实例,基本上就可以进行对象缓存分片.吉姆·韦伯(Jim Webber)关于此方法的博客: http ://jim.webber.name/2011/02/scaling-neo4j-with-cache-sharding-and-neo4j-ha/

If the heap size / available RAM is too small to hold the full dataset in the object cache, you can go with the enterprise edition. By putting a load balancer in front of your n Neo4j instances that routes all requests for a certain part of the graph to the same instance you do basically a object cache sharding. Jim Webber bloggt on this approach: http://jim.webber.name/2011/02/scaling-neo4j-with-cache-sharding-and-neo4j-ha/

对于性能至关重要的查询,可能最好使用遍历API 甚至是核心API.

For performance critical queries it might be an idea to refactor the Cypher query into an equivalent using traversal API or even go down to core API.

这篇关于如何配置使neo4j更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆