Neo4j:如何通过密码删除数据库中的所有重复关系? [英] Neo4j: how do I delete all duplicate relationships in the database through cypher?

查看:20
本文介绍了Neo4j:如何通过密码删除数据库中的所有重复关系?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含大量节点(1000 万以上)的庞大数据库.整个数据库中只有一种关系.但是,有大量节点之间存在重复的关系.我目前拥有的是这个 cypher 脚本,它可以找到所有具有重复的对,然后是一个运行并清理每个对的 python 脚本(在这些节点之间只留下一个独特的关系).

match (a)-[r]->(b) with a,b, count(*) as c where c>1 return a.pageid, b.pageid, c LIMIT 100000;

这对于小型数据库来说效果很好,但是当我在大型数据库上运行它时,它最终会崩溃,但堆上的内存不足(越来越多地撞到盒子没有帮助).

所以,问题有两个:1)是否有任何类型的索引我可以建立关系(现在没有)可以帮助加快速度?2)是否有一个密码查询可以(以快速的方式......或至少可靠地)删除数据库中的所有重复关系,为每个节点对(它们之间已经存在关系)只留下一个唯一的关系?

附言我在 ubuntu (12something) AWS 机器上运行 neo4j 2.0.1.

P.P.S.我意识到有这个答案:stackoverflow,但是他要问的是更具体的东西(针对 2 个已知节点),并且包含完整数据库的答案不再运行(语法更改?)

提前致谢!

解决方案

在链接的 SO 问题中使用 db 全局查询会出现什么错误?尝试将 | 替换为 :FOREACH 中,这是我能看到的唯一破坏性语法差异.2.x 说同样的话的方式,除了适应你在数据库中只有一种关系类型,可能是

MATCH (a)-[r]->(b)WITH a, b, TAIL (COLLECT (r)) as rrFOREACH (r IN rr | 删除 r)

我认为 WITH 管道会在没有重复项时携带空尾部,而且我不知道循环遍历一个空集合有多昂贵——我的感觉是引入限制是在 WITH 之后使用过滤器,类似于

MATCH (a)-[r]->(b)WITH a, b, TAIL (COLLECT (r)) as rr哪里长度(rr)>0 限制 100000FOREACH (r IN rr | 删除 r)

由于此查询根本不涉及属性(与您的相反,它返回 (a) 和 (b) 的属性)我认为对于像您这样的中型图来说,它应该不会占用太多内存,但是您将不得不尝试限制.

如果内存仍然存在问题,那么如果有任何方法可以限制您使用的节点(不涉及属性),那也是一个好主意.如果您的节点可以通过标签区分,请尝试一次运行一个标签的查询

MATCH (a:A)-[r]->(b)//etc..MATCH (a:B)-[r]->(b)//等等..

I have a huge database with a ton of nodes (10mil+). There is only one type of relationship in the whole database. However, there are a ton of nodes that have duplicated relationships between them. What i have currently is this cypher script that finds all the pairs with duplicates, and then a python script that runs through and cleans up each one (leaving just one unique relationship between those nodes).

match (a)-[r]->(b) with a,b, count(*) as c where c>1 return a.pageid, b.pageid, c LIMIT 100000;

this works fairly well for a small database, but when i run it on a big one it eventually blows up with an exception about running out of memory on the heap (bumping up the box more and more doesn't help).

So, the question is 2-fold: 1) Is there any sort of indexing i can put on relationships (right now there is none) that would help speed this up? 2) Is there a cypher query that can (in a fast manner... or at least reliably) delete all the duplicate relationships in the database leaving just one unique one for each node pair (that already has relationship between them)?

P.S. I'm running neo4j 2.0.1 on an ubuntu (12something) AWS box.

P.P.S. I realize there is this answer: stackoverflow, however what he's asking is something more specific (against 2 already known nodes), and the answer that has full database covered doesn't run anymore (syntax change?)

Thanks in advance!

解决方案

What error do you get with the db global query in the linked SO question? Try substituting | for : in the FOREACH, that's the only breaking syntax difference that I can see. The 2.x way to say the same thing, except adapted to your having only one relationship type in the db, might be

MATCH (a)-[r]->(b)
WITH a, b, TAIL (COLLECT (r)) as rr
FOREACH (r IN rr | DELETE r)

I think the WITH pipe will carry the empty tails when there is no duplicate, and I don't know how expensive it is to loop through an empty collection–my sense is that the place to introduce the limit is with a filter after the WITH, something like

MATCH (a)-[r]->(b)
WITH a, b, TAIL (COLLECT (r)) as rr
WHERE length(rr) > 0 LIMIT 100000
FOREACH (r IN rr | DELETE r)

Since this query doesn't touch properties at all (as opposed to yours, which returns properties for (a) and (b)) I don't think it should be very memory heavy for a medium graph like yours, but you will have to experiment with the limit.

If memory is still a problem, then if there is any way for you to limit the nodes to work with (without touching properties), that's also a good idea. If your nodes are distinguishable by label, try running the query for one label at the time

MATCH (a:A)-[r]->(b) //etc..
MATCH (a:B)-[r]->(b) //etc..

这篇关于Neo4j:如何通过密码删除数据库中的所有重复关系?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆