在py2neo中批处理 [英] Batching in py2neo

查看:249
本文介绍了在py2neo中批处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经开始使用Node4j,正在研究批处理,但是不幸的是,在创建节点之间的关系时遇到了一些问题.

I have started working with Node4j and I was exploring a bit the batch processing, but unfortunately, I am having some problems in creating relations between nodes.

我的问题如下.我有从文件中读取的网站和用户的列表.我可能在该文件中重复了网站和用户,所以我不想为这些重复的条目插入新节点.但是由于文件很大,我想批量处理节点和关系.

My problem is the following. I have a list of websites and users that I read from a file. I may have repeated websites and users in that file, so I do not want to insert new nodes for those repeated entries. But as the file is big, I want to batch the processing of the nodes and relations.

基本上,我具有这两个功能来创建节点和关系并将它们添加到批处理中.

Basically, I have these two functions to create nodes and relations and add them to the batch.

graph_db = neo4j.GraphDatabaseService("http://localhost:7474/db/data/")
batch = neo4j.WriteBatch(graph_db)

def create_node(pvalue, svalue, type):
    return batch.create({\
        "pkey"  : pvalue,
        "skey"  : svalue,
        "type"  : type
        }
    )


def create_rel(from_node, type_label, to_node, fields):
    properties =\
    {"ACCT_KEY":  fields.ACCT_KEY}

    relation = rel(from_node, type_label, to_node, **properties)
    batch.create(relation)

然后,在使用字典确保之前没有创建节点之后,我这样做:

Then, after using a dictionary to make sure I have not created the nodes before, I do:

node1 = create_node("ATTRIBUTE_1", "ATTRIBUTE_2", "WEBSITE")
node2 = create_node("ATTRIBUTE_3", "ATTRIBUTE_4", "USER")

create_rel(node1, "VISITED_BY", node2, fields)

我将对"node1"和"node2"的引用保存在字典中,因此当我要创建涉及网站或已注册用户的关系时,我不会再次创建该节点,而是直接使用参考资料.我在一个循环中执行此操作,并且工作正常,直到经过一定数量的迭代后才决定执行此操作:

I save the references to "node1" and "node2" in a dictionary, so when I want to create a relation involving a website or a user that has already been registered, I will not create the node again, but use directly the reference. I do this inside a loop and it works fine, till I decide to do this after a certain number of iterations:

batch.submit()
batch.clear()

当我决定使用以前批次中的那些引用时,出现以下错误:

When I decide to use those references from previous batches, I get the following error:

Traceback (most recent call last):
    File "main.py", line 102, in <module>
        create_rel(cardholder, fraud_label, merchant,fields)
    File "main.py", line 33, in create_rel
        batch.create(relation)
    File "/usr/local/lib/python2.7/dist-packages/py2neo/neo4j.py", line 2775, in create
        "to": self._uri_for(entity.end_node)
    File "/usr/local/lib/python2.7/dist-packages/py2neo/neo4j.py", line 2613, in _uri_for
        uri = "{{{0}}}".format(self.find(resource))
    File "/usr/local/lib/python2.7/dist-packages/py2neo/neo4j.py", line 2604, in find
        raise ValueError("Request not found")
ValueError: Request not found

我相信这是因为它以某种方式丢失了先前批次中的引用,并且它们不再有效.我试图从节点中收集ID并改为使用它们,但是我找不到如何做的方法.任何帮助将不胜感激,谢谢.

I believe that this happens because it somehow loses the references from the previous batches and they are no longer valid. I have tried to collect the IDs from the nodes and use those instead, but I cannot find how to do it. Any help would be appreciated, thanks.

我的Node4j版本是"2.0.3 Unix社区版"和py2neo版本1.6.4.

My Node4j version is "2.0.3 community edition for Unix" and py2neo version 1.6.4.

推荐答案

很抱歉,文档中没有明确说明,但是引用不能扩展到单独的批次或批次提交中.引用先前创建的项目的正确方法是解析第一次提交的结果,然后将所需的实体传递给第二次提交.

Apologies if this is not clear from the documentation but references cannot extend across separate batches or batch submissions. The correct way to refer to those items previously created is to parse the results from the first submission and pass the entities required into the second.

我通常建议每个提交使用一个批处理,并避免重复使用同一批处理对象.将来的py2neo版本可能会阻止这种情况.

I would generally recommend using one batch per submission and avoiding reuse of the same batch object. Future versions of py2neo will likely prevent this anyway.

这篇关于在py2neo中批处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆