Neo4j分区 [英] Neo4j partition

查看:164
本文介绍了Neo4j分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是在neo4j分区之间进行物理分隔的方法吗? 这意味着以下查询将转到node1:

Is the a way to physically separate between neo4j partitions? Meaning the following query will go to node1:

Match (a:User:Facebook)

此查询将转到另一个节点(可能托管在docker上)

While this query will go to another node (maybe hosted on docker)

Match (b:User:Google)

是这种情况: 我想在neo4j下存储几个客户端的数据,希望其中很多. 现在,我不确定什么是最好的设计,但必须满足一些条件:

this is the case: i want to store data of several clients under neo4j, hopefully lots of them. now, i'm not sure about whats is the best design for that but it has to fulfill few conditions:

  1. 密码查询不应该返回任何混合数据(确实很难确保没有开发人员会忘记密码查询中的:Partition1"(例如))
  2. 1个客户端的
  3. 性能不应影响另一个客户端,例如,如果1个客户端具有大量数据,而另一个客户端具有少量数据,或者如果当前正在运行1个客户端的大量"查询,我不会希望另一个客户端的其他精简"查询遭受缓慢而缓慢的性能困扰

换句话说,我认为会在将来某个时候将所有内容存储在1个节点下,当我将有更多客户端时会出现可伸缩性问题.

in other words, storing everything under 1 node, at some point in the future, i think, will have scalability problem, when i'll have more clients.

顺便说一句,集群很少是常见的吗?

btw, is it common to have few clusters?

与为每个客户端创建不同的Label相比,进行分区还有什么优势?例如:Users_client_1,Users_client_2等

also whats the advantage of partitioning over creating different Label for each client? for example: Users_client_1 , Users_client_2 etc

推荐答案

简短的回答:不,没有.

Short answer: no, there isn't.

Neo4j具有高可用性(HA)群集,您可以在其中复制整个图形许多机器,然后迅速针对该副本提供许多请求,但是它们并没有对一个非常大的图进行分区,因此其中的一些存储在此处,其他部分存储在其中,然后通过一种查询机制进行连接.

Neo4j has high availability (HA) clusters where you can make a copy of your entire graph on many machines, and then serve many requests against that copy quickly, but they don't partition a really huge graph so some of it is stored here, some other parts there, and then connected by one query mechanism.

更详细的答案:图形划分是一个难题,有待不断研究.您可以在Wikipedia上了解更多信息,但是要点是,在创建分区时,您正在拆分图形到多个不同的位置,然后需要处理跨分区的关系的复杂性.交叉分区是一项昂贵的操作,因此分区时的真正问题是,如何进行分区以使查询中的交叉分区需求尽可能少地出现?

More detailed answer: graph partitioning is a hard problem, subject to ongoing research. You can read more about it over at wikipedia, but the gist is that when you create partitions, you're splitting your graph up into multiple different locations, and then needing to deal with the complication of relationships that cross partitions. Crossing partitions is an expensive operation, so the real question when partitioning is, how do you partition such that the need to cross partitions in a query comes up as infrequently as possible?

这是一个非常棘手的问题,因为它不仅取决于数据模型,而且取决于访问模式,访问模式可能会发生变化.

That's a really hard question, since it depends not only on the data model but on the access patterns, which may change.

这是情况有多严重(被盗的报价):

Here's how bad the situation is (quote stolen):

通常,图分区问题属于NP-hard类别 问题.解决这些问题的方法通常是使用 启发式和近似算法.[3]但是,统一图 分区或平衡图分区问题可以显示为 NP完全,可以在任何有限因子内近似.[1]即使是 特殊的图类,例如树和网格,没有合理的选择 存在近似算法,[4]除非P = NP.网格是一个 特别有趣的情况是,他们对结果进行建模 来自有限元模型(FEM)模拟.不仅数量 组件之间的边缘的近似值,也包括 可以证明没有合理的完全多项式 这些图存在一些算法.

Typically, graph partition problems fall under the category of NP-hard problems. Solutions to these problems are generally derived using heuristics and approximation algorithms.[3] However, uniform graph partitioning or a balanced graph partition problem can be shown to be NP-complete to approximate within any finite factor.[1] Even for special graph classes such as trees and grids, no reasonable approximation algorithms exist,[4] unless P=NP. Grids are a particularly interesting case since they model the graphs resulting from Finite Element Model (FEM) simulations. When not only the number of edges between the components is approximated, but also the sizes of the components, it can be shown that no reasonable fully polynomial algorithms exist for these graphs.

不要给您带来太多的厄运和忧郁,很多人都对大图进行了划分. Facebook和Twitter每天都会这样做,因此您可以在Twitter端了解FlockDB

Not to leave you with too much doom and gloom, plenty of people have partitioned big graphs. Facebook and twitter do it every day, so you can read about FlockDB on the twitter side or avail yourself of relevant facebook research. But to summarize and cut to the chase, it depends on your data and most people who partition design a custom partitioning strategy, it's not something software does for them.

最后,其他架构(例如Apache Giraph)可以在某种意义上进行自动分区.如果将图存储在hadoop上,并且hadoop已经在整个集群中自动缩放,那么从技术上讲,这将自动为您划分图形.太酷了吧?好吧...很酷,直到您意识到仍然必须在整个位置执行图遍历操作,由于所有这些分区都必须遍历,您通常要尝试的性能情况,这可能会导致执行效果非常差首先避免明智地分区.

Finally, other architectures (such as Apache Giraph) can auto-partition in some senses; if you store a graph on top of hadoop, and hadoop already automagically scales across a cluster, then technically this is partitioning your graph for you, automagically. Cool, right? Well...cool until you realize that you still have to execute graph traversal operations all over the place, which may perform very poorly owing to the fact that all of those partitions have to be traversed, the performance situation you're usually trying to avoid by partitioning wisely in the first place.

这篇关于Neo4j分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆