在neo4j中查找常见源的起始节点之间的匹配 [英] Finding matches between start nodes for common sources in neo4j

查看：531 发布时间：2020/5/17 0:42:45 neo4j cypher

本文介绍了在neo4j中查找常见源的起始节点之间的匹配的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

作为一些分析的一部分，我试图找到目标的单跳路径具有80％的共同起源.

As part of some analysis, I am trying to find targets that have more than 80% common origins for one-hop paths.

数据是这类的:所有节点都是系统，并且唯一相关的关系是ConnectsTo.

The data is of the kind: all nodes are systems, and the only relationship that is relevant is ConnectsTo.

因此，我可以编写类似

match (n:system)-[r:ConnectsTo]->(m:system) return n,m

获取系统m的来源n.

我正在寻找所有具有80％或更多通用源系统的系统.

I am looking to find all systems m that have 80% or more common source systems.

请告知如何对所有系统执行此操作.我尝试使用collect，但是担心无法编写正确的迭代.

Please advise how this could be done for all systems. I tried with collect but am afraid I couldn't write the proper iteration.

推荐答案

首先创建一个简单的示例数据集:

Let's start by creating a simple example data set:

CREATE
  (s1:System {name:"s1"}), 
  (s2:System {name:"s2"}), 
  (s3:System {name:"s3"}), 
  (s4:System {name:"s4"}), 
  (s5:System {name:"s5"}), 
  (s1)-[:ConnectsTo]->(s3),
  (s1)-[:ConnectsTo]->(s4),
  (s2)-[:ConnectsTo]->(s3),
  (s2)-[:ConnectsTo]->(s4),
  (s2)-[:ConnectsTo]->(s5)

结果显示在下图中.

我们从至少具有一个公共源的节点对(m1和m2)开始.我们计算:

We start from node pairs (m1 and m2) that have at least a single common source. We calculate:

每个节点(sources1Count和sources2Count)的来源数量
常见来源数量(commonSources)

the number of sources for each node (sources1Count and sources2Count)
the number of common sources (commonSources)

然后，我们将公共源的数量与节点的源数量进行比较.根据您认为"80％通用"的情况，这可能需要进行一些微调. toFloat函数是必需的，以避免类型不匹配.

Then we compare the number of common sources to the number of sources for the nodes. This could use a bit of fine-tuning, based on what you consider "80% common". The toFloat function is required to avoid type mismatches.

查询:

MATCH (m1)<-[:ConnectsTo]-()-[:ConnectsTo]->(m2)
MATCH
  (n1)-[:ConnectsTo]->(m1),
  (n2)-[:ConnectsTo]->(m2)
WITH m1, m2, COUNT(DISTINCT n1) AS sources1Count, COUNT(DISTINCT n2) AS sources2Count
MATCH (m1)<-[:ConnectsTo]-(n)-[:ConnectsTo]->(m2)
WITH m1, m2, sources1Count, sources2Count, COUNT(n) AS commonSources
WHERE
  // we only need each m1-m2 pair once
  ID(m1) < ID(m2) AND
  // similarity
  commonSources / 0.8 >= sources1Count AND
  commonSources / 0.8 >= sources2Count
RETURN m1, m2
ORDER BY m1.name, m2.name

这将产生以下结果.

╒══════════╤══════════╕
│m1        │m2        │
╞══════════╪══════════╡
│{name: s3}│{name: s4}│
└──────────┴──────────┘

PS.要检查相似性，可以使用类似以下内容的

PS. for checking the similarity, you could use something like:

sources1Count <= toInt(commonSources / 0.8) >= sources2Count

这避免了0.8的重复，但看起来不太好.

This avoids the duplication of 0.8 but does not look very nice.

更新:来自InverseFalcon的注释中的一个想法:使用SIZE代替MATCH和COUNT

Update: an idea from InverseFalcon in the comments: use SIZE instead of MATCH and COUNT

MATCH (m1)<-[:ConnectsTo]-()-[:ConnectsTo]->(m2)
WITH m1, m2, SIZE(()-[:ConnectsTo]->(m1)) as sources1Count, SIZE(()-[:ConnectsTo]->(m2)) as sources2Count
MATCH (m1)<-[:ConnectsTo]-(n)-[:ConnectsTo]->(m2)
WITH m1, m2, sources1Count, sources2Count, COUNT(n) AS commonSources
WHERE
    // we only need each m1-m2 pair once
    ID(m1) < ID(m2) AND
    // similarity
    commonSources / 0.8 >= sources1Count AND
    commonSources / 0.8 >= sources2Count
RETURN m1, m2
ORDER BY m1.name, m2.name

这篇关于在neo4j中查找常见源的起始节点之间的匹配的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在neo4j中查找常见源的起始节点之间的匹配 [英] Finding matches between start nodes for common sources in neo4j

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在neo4j中查找常见源的起始节点之间的匹配 [英] Finding matches between start nodes for common sources in neo4j

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭