py2neo-Neo4j-系统错误-创建批处理节点/关系 [英] py2neo - Neo4j - System Error - Create Batch Nodes/Relationships

查看:694
本文介绍了py2neo-Neo4j-系统错误-创建批处理节点/关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试批量创建节点&关系-批处理创建失败-帖子结尾处的回溯

Attempting to batch create nodes & relationships - batch creation is failing - Traceback at end of the post

具有较小节点子集的注释代码功能-进入大量关系时失败,不清楚发生在什么限制上.

Note code functions with smaller subset of nodes - fails when get into massive number of relationships, unclear at what limit this is occurring.

  • 想知道我是否需要将ulimit增加到40,000个以上打开的文件
  • 在进行批量创建时,阅读人们在使用REST API时遇到Xstream问题的地方-不清楚问题集是在py2neo范围内,还是Neo4j服务器调整/配置上,还是在Python端上光谱. 任何指导将不胜感激.
  • Wondering if I need to increase ulimit above 40,000 open files
  • Read somewhere where persons were running into Xstream issues with REST API while conducting batch create - unclear if the problem set is on the py2neo end of the spectrum, or on the Neo4j server tuning/configuration, or on the Python end of the spectrum. Any guidance would be greatly appreciated.

数据集内的一个群集最终在700多个节点中具有约625525个关系. 总体关系将达到1M +-使用带x86_64的Apple Macbook Pro Retina-Ubuntu 13.04,SSD,8GB内存.

One cluster within the data set ends up with around 625525 relationships out of 700+ nodes. Total Relationships will be 1M+ - utilizing an Apple Macbook Pro Retina with x86_64 - Ubuntu 13.04, SSD, 8GB memory.

  • Neo4j:已配置auto_indexing& auto_relationships设置为ON
  • 通过Python Panadas DataFrame.groupby()集群/分组的节点
  • 节点:包含3个属性
  • 关系属性:1-> IN&建立关系
  • 将ulimit设置为可打开40,000个文件

https://github.com/alienone/OSINT/blob/master/MANDIANTAPT/spitball.py

  • 操作系统:Ubuntu 13.04
  • Python版本:2.7.5
  • py2neo版本:1.5.1
  • Java版本:1.7.0_25-b15
  • Neo4j版本:社区版1.9.2

回溯(最近通话最近): 在第63行的"/home/alienone/Programming/Python/OSINT/MANDIANTAPT/spitball.py"文件中 主要的() 主目录中的文件"/home/alienone/Programming/Python/OSINT/MANDIANTAPT/spitball.py",第59行 graph_db.create(* sorted_nodes) 在创建时,文件"/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/neo4j.py" 返回batch.submit() 提交文件"/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/neo4j.py",第2123行 用于self._submit()中的响应 提交中的文件"/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/neo4j.py",第2092行 对于ID ,请求枚举(self.requests) 文件_send中的第428行"/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/rest.py" 返回self._client().send(request) 发送中的文件"/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/rest.py",行365 返回响应(request.graph_db,rs.status,request.uri,rs.getheader(位置",无),rs_body) init 中的文件"/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/rest.py",行279 引发SystemError(body) SystemError:无

Traceback (most recent call last): File "/home/alienone/Programming/Python/OSINT/MANDIANTAPT/spitball.py", line 63, in main() File "/home/alienone/Programming/Python/OSINT/MANDIANTAPT/spitball.py", line 59, in main graph_db.create(*sorted_nodes) File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/neo4j.py", line 420, in create return batch.submit() File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/neo4j.py", line 2123, in submit for response in self._submit() File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/neo4j.py", line 2092, in submit for id, request in enumerate(self.requests) File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/rest.py", line 428, in _send return self._client().send(request) File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/rest.py", line 365, in send return Response(request.graph_db, rs.status, request.uri, rs.getheader("Location", None), rs_body) File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/rest.py", line 279, in init raise SystemError(body) SystemError: None

以退出代码1完成的过程

Process finished with exit code 1

推荐答案

我遇到了类似的问题.处理它的一种方法是对数据块而不是整个数据集进行batch.submit().这当然较慢,但是将一百万个节点拆分为5000个块仍然比分别添加每个节点要快.

I had a similar issue. One way to deal with it is to do the batch.submit() for chunks of your data and not the whole data set. This is slower of course, but splitting one million nodes in chunks of 5000 is still faster than adding every node separately.

我使用一个小的帮助程序类来执行此操作,请注意我的所有节点都已建立索引: https://gist.github.com/anonymous/6293739

I use a small helper class to do this, note that all my nodes are indexed: https://gist.github.com/anonymous/6293739

这篇关于py2neo-Neo4j-系统错误-创建批处理节点/关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆