Neo4j:为所有匹配查询的节点分配唯一值 [英] Neo4j: Assign unique values to all nodes matching query

查看:151
本文介绍了Neo4j:为所有匹配查询的节点分配唯一值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在数据库中的所有节点上实现唯一的ID属性,但需要将其应用于现有数据.我正在使用Ruby执行生成IDS,然后从那里运行Cypher查询.我想避免一个查询来查找缺少属性的节点,而另一个查询则要分别在每个节点上设置属性,因为这将需要total_nodes + 1查询.

I want to implement a unique ID property on all nodes in my database but need to apply it to existing data. I'm using Ruby to perform generate the IDS and then running the Cypher query from there. I want to avoid one query to find nodes missing the property, another to set the property on each node individually, since that would require total_nodes + 1 queries.

最初,我以为我可以做这样的事情:

Initially, I was thinking I could do something like this:

MATCH (n:`#{label}`) WHERE NOT HAS(n.my_id) SET n.my_id = '#{gen_method}' RETURN DISTINCT(true)

当然,这是行不通的,因为它将在Ruby中调用一次gen_method,然后Neo4j会尝试将所有节点ID设置为该值.

Of course, this wouldn't work because it would call gen_method once in Ruby and then Neo4j would try to set all nodes IDs to that one value.

我现在正在考虑最好先在Ruby中生成大量ID,然后将其包含在Cypher查询中.我想遍历匹配的节点,并将缺少的属性设置为等于数组中其对应的索引.逻辑应该是这样的

I'm thinking now that it might be best to generate a large number of IDs in Ruby first, then include that in the Cypher query. I'd like to loop through the matched nodes and set the missing property equal to its corresponding index in the array. The logic should go something like this

MATCH NODES WHERE GIVEN PROPERTY IS NULL, LIMIT TO 10,000
CREATE A COLLECTION OF THOSE NODES
SET NEW UUIDS ARRAY (provided by Ruby) AS "IDS_ARRAY"
FOR EACH NODE IN COLLECTION
  SET GIVEN PROPERTY VALUE = CORRESPONDING INDEX POSITION IN "IDS_ARRAY"
RETURN COUNT OF NODES WHERE GIVEN PROPERTY IS NULL

基于返回值,它将知道执行此操作的次数. Cypher有一个foreach循环,但是我该怎么做,特别是如果我的unique_ids数组是从Cypher查询中的字符串开始的时候?

Based on the return value, it would know how many more times to do this. Cypher has a foreach loop but how I do this, especially if my unique_ids array is starting from a string in the Cypher query?

unique_ids = ['first', 'second', 'third', 'etc']
i = 0
for node in matched_nodes
  node.my_id_property = unique_ids[i]
  i += 1
end

有可能吗?有其他可行的处理方法吗?

Is it even possible? Is there a different way of handling this that will work?

推荐答案

知道了!找到 http://java.dzone.com/articles/neo4j-cypher-creating,其中提供了执行此操作的方法,并且 http://jexp.de/blog/2014/03/quickly-create-a-100k-neo4j-graph-data-model-with-cypher-only/指出range函数.我执行此操作的Ruby代码的初稿如下:

Got it! Found http://java.dzone.com/articles/neo4j-cypher-creating, which provided a method for doing this, and http://jexp.de/blog/2014/03/quickly-create-a-100k-neo4j-graph-data-model-with-cypher-only/ pointed out the range function. My first draft of the Ruby code that performs this looks like this:

def add_ids_to(model)
  label = model.mapped_label_name
  property = model.primary_key
  total = 1

  until total == 0
    total = Neo4j::Session.query("MATCH (n:`#{label}`) WHERE NOT has(n.#{property}) RETURN COUNT(n) as ids").first.ids
    return if total == 0
    to_set = total > 900 ? 900 : total
    new_ids = [].tap do |ids_array|
                to_set.times { ids_array.push "'#{new_id_for(model)}'" }
              end
    Neo4j::Session.query("MATCH (n:`#{label}`) WHERE NOT has(n.#{property})
      with COLLECT(n) as nodes, [#{new_ids.join(',')}] as ids
      FOREACH(i in range(0,#{to_set - 1})| 
        FOREACH(node in [nodes[i]]|
          SET node.#{property} = ids[i]))
      RETURN distinct(true)
      limit #{to_set}")
  end
end

我认为这一切都很可读.关于查询本身,我正在使用Neo4j.rb和neo4j-core,但是在这种情况下,我跳过了Cypher DSL.我将每个查询限制为最多900个节点,因为这是我可以可靠地在不耗尽内存的情况下进行的最高查询.调整您的JVM堆大小.

I think that's all pretty readable. Regarding the queries themselves, I'm using Neo4j.rb and neo4j-core, but I'm skipping the Cypher DSL in this case. I'm limiting each query to a max of 900 nodes because that was the highest I could reliably go without running out of memory. Tune for your JVM heap size.

这篇关于Neo4j:为所有匹配查询的节点分配唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆