Neo4j推荐密码查询优化 [英] Neo4j Recommendation Cypher Query Optimization

查看:104
本文介绍了Neo4j推荐密码查询优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用嵌入Java应用程序中的Neo4j社区版进行推荐.我制作了一个自定义函数,其中包含比较两个实体(产品和用户)的复杂逻辑.这两个实体都作为图中的节点存在,并且每个都具有20多个属性以进行比较.例如.我以以下格式调用此函数:

I am using Neo4j community edition embedded in java application for recommendation purpose. I made a custom function which contains a complex logic of comparing two entities, namely product and users. Both entities are present as nodes in graph and has more than 20 properties each for comparison purpose. For eg. I am calling this function in following format:

match (e:User {user_id:"some-id"}) with e
match (f:Product {product_id:"some-id"}) with e,f
return e,f,findComparisonValue(e,f) as pref_value; 

此函数调用平均大约需要4-5毫秒才能运行.现在,为了向特定用户推荐最佳产品,我编写了一个密码查询,该查询可对所有产品进行迭代,计算pref_value并对它们进行排名.我的密码查询如下:

This function call on an average takes about 4-5 ms to run. Now, to recommend best product to a particular user, I wrote a cypher query which iterates over all products, calculate the pref_value and rank them. My cypher query looks like this:

MATCH (source:User) WHERE id(source)={id} with source 
MATCH (reco:Product) WHERE reco.is_active='t'  
with reco, source, findComparisonValue(source, reco) as score_result 
RETURN distinct reco, score_result.score as score, score_result.params as params, score_result.matched_keywords as matched_keywords 
order by score desc

对图结构的一些见解:

Total Number of nodes: 2 million
Total Number of relationships: 20 million
Total Number of Users: 0.2 million
Total Number of Products: 1.8 million

上面的密码查询要花10秒钟以上的时间,因为它要遍历所有产品.在这个密码查询的基础上,我正在使用graphaware-reco模块来满足我的推荐需求(使用预先计算,筛选,后处理等).我考虑过并行化,但是社区版不支持集群.现在,随着系统中用户的数量日益增加,我需要考虑一个可扩展的解决方案.

The above cypher query is taking more than 10 seconds as it is iterating over all the products. On top of this cypher query, I am using graphaware-reco module for my recommendation needs (Using precompute, filteing, post processing etc). I thought of parallelising this but community edition does not support clustering. Now, as number of users in system is increasing day by day, I need to think of a scalable solution.

有人可以在这里帮助我,如何优化查询.

Can anyone help me out here, on how to optimize the query.

推荐答案

正如其他人所评论的那样,在单个查询中可能进行数百万次的大量计算将很慢,并且不会利用neo4j的优势.您应该研究修改数据模型和计算,以便可以利用关系和/或索引.

As others have commented, doing a significant calculation potentially millions of times in a single query is going to be slow, and does not take advantage of neo4j's strengths. You should investigate modifying your data model and calculation so that you can leverage relationships and/or indexes.

同时,您的第二个查询有很多建议:

In the meantime, there are a number of things to suggest with your second query:

  1. 确保已创建

  1. Make sure you have created an index for :Product(is_active), so that it is not necessary to scan all products. (By the way, if that property is actually supposed to be a boolean, then consider making it a boolean rather than a string.)

RETURN子句应该不需要DISTINCT运算符,因为所有结果行无论如何都应该是不同的.这是因为每个reco值都已经不同.删除该关键字应该可以提高性能.

The RETURN clause should not need the DISTINCT operator, since all the result rows should be distinct anyway. This is because every reco value is already distinct. Removing that keyword should improve performance.

这篇关于Neo4j推荐密码查询优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆