在py2neo中批处理 [英] Batching in py2neo

查看：249 发布时间：2020/5/17 0:38:53 neo4j py2neo

本文介绍了在py2neo中批处理的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经开始使用Node4j，正在研究批处理，但是不幸的是，在创建节点之间的关系时遇到了一些问题.

I have started working with Node4j and I was exploring a bit the batch processing, but unfortunately, I am having some problems in creating relations between nodes.

我的问题如下.我有从文件中读取的网站和用户的列表.我可能在该文件中重复了网站和用户，所以我不想为这些重复的条目插入新节点.但是由于文件很大，我想批量处理节点和关系.

My problem is the following. I have a list of websites and users that I read from a file. I may have repeated websites and users in that file, so I do not want to insert new nodes for those repeated entries. But as the file is big, I want to batch the processing of the nodes and relations.

基本上，我具有这两个功能来创建节点和关系并将它们添加到批处理中.

Basically, I have these two functions to create nodes and relations and add them to the batch.

graph_db = neo4j.GraphDatabaseService("http://localhost:7474/db/data/")
batch = neo4j.WriteBatch(graph_db)

def create_node(pvalue, svalue, type):
    return batch.create({\
        "pkey"  : pvalue,
        "skey"  : svalue,
        "type"  : type
        }
    )


def create_rel(from_node, type_label, to_node, fields):
    properties =\
    {"ACCT_KEY":  fields.ACCT_KEY}

    relation = rel(from_node, type_label, to_node, **properties)
    batch.create(relation)

然后，在使用字典确保之前没有创建节点之后，我这样做:

Then, after using a dictionary to make sure I have not created the nodes before, I do:

node1 = create_node("ATTRIBUTE_1", "ATTRIBUTE_2", "WEBSITE")
node2 = create_node("ATTRIBUTE_3", "ATTRIBUTE_4", "USER")

create_rel(node1, "VISITED_BY", node2, fields)

我将对"node1"和"node2"的引用保存在字典中，因此当我要创建涉及网站或已注册用户的关系时，我不会再次创建该节点，而是直接使用参考资料.我在一个循环中执行此操作，并且工作正常，直到经过一定数量的迭代后才决定执行此操作:

I save the references to "node1" and "node2" in a dictionary, so when I want to create a relation involving a website or a user that has already been registered, I will not create the node again, but use directly the reference. I do this inside a loop and it works fine, till I decide to do this after a certain number of iterations:

batch.submit()
batch.clear()

当我决定使用以前批次中的那些引用时，出现以下错误:

When I decide to use those references from previous batches, I get the following error:

Traceback (most recent call last):
    File "main.py", line 102, in <module>
        create_rel(cardholder, fraud_label, merchant,fields)
    File "main.py", line 33, in create_rel
        batch.create(relation)
    File "/usr/local/lib/python2.7/dist-packages/py2neo/neo4j.py", line 2775, in create
        "to": self._uri_for(entity.end_node)
    File "/usr/local/lib/python2.7/dist-packages/py2neo/neo4j.py", line 2613, in _uri_for
        uri = "{{{0}}}".format(self.find(resource))
    File "/usr/local/lib/python2.7/dist-packages/py2neo/neo4j.py", line 2604, in find
        raise ValueError("Request not found")
ValueError: Request not found

我相信这是因为它以某种方式丢失了先前批次中的引用，并且它们不再有效.我试图从节点中收集ID并改为使用它们，但是我找不到如何做的方法.任何帮助将不胜感激，谢谢.

I believe that this happens because it somehow loses the references from the previous batches and they are no longer valid. I have tried to collect the IDs from the nodes and use those instead, but I cannot find how to do it. Any help would be appreciated, thanks.

我的Node4j版本是"2.0.3 Unix社区版"和py2neo版本1.6.4.

My Node4j version is "2.0.3 community edition for Unix" and py2neo version 1.6.4.

在py2neo中批处理 [英] Batching in py2neo

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在py2neo中批处理 [英] Batching in py2neo

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭