Neo4j推荐密码查询优化 [英] Neo4j Recommendation Cypher Query Optimization
问题描述
我正在使用嵌入Java应用程序中的Neo4j社区版进行推荐.我制作了一个自定义函数,其中包含比较两个实体(产品和用户)的复杂逻辑.这两个实体都作为图中的节点存在,并且每个都具有20多个属性以进行比较.例如.我以以下格式调用此函数:
I am using Neo4j community edition embedded in java application for recommendation purpose. I made a custom function which contains a complex logic of comparing two entities, namely product and users. Both entities are present as nodes in graph and has more than 20 properties each for comparison purpose. For eg. I am calling this function in following format:
match (e:User {user_id:"some-id"}) with e
match (f:Product {product_id:"some-id"}) with e,f
return e,f,findComparisonValue(e,f) as pref_value;
此函数调用平均大约需要4-5毫秒才能运行.现在,为了向特定用户推荐最佳产品,我编写了一个密码查询,该查询可对所有产品进行迭代,计算pref_value并对它们进行排名.我的密码查询如下:
This function call on an average takes about 4-5 ms to run. Now, to recommend best product to a particular user, I wrote a cypher query which iterates over all products, calculate the pref_value and rank them. My cypher query looks like this:
MATCH (source:User) WHERE id(source)={id} with source
MATCH (reco:Product) WHERE reco.is_active='t'
with reco, source, findComparisonValue(source, reco) as score_result
RETURN distinct reco, score_result.score as score, score_result.params as params, score_result.matched_keywords as matched_keywords
order by score desc
对图结构的一些见解:
Total Number of nodes: 2 million
Total Number of relationships: 20 million
Total Number of Users: 0.2 million
Total Number of Products: 1.8 million
上面的密码查询要花10秒钟以上的时间,因为它要遍历所有产品.在这个密码查询的基础上,我正在使用graphaware-reco模块来满足我的推荐需求(使用预先计算,筛选,后处理等).我考虑过并行化,但是社区版不支持集群.现在,随着系统中用户的数量日益增加,我需要考虑一个可扩展的解决方案.
The above cypher query is taking more than 10 seconds as it is iterating over all the products. On top of this cypher query, I am using graphaware-reco module for my recommendation needs (Using precompute, filteing, post processing etc). I thought of parallelising this but community edition does not support clustering. Now, as number of users in system is increasing day by day, I need to think of a scalable solution.
有人可以在这里帮助我,如何优化查询.
Can anyone help me out here, on how to optimize the query.
推荐答案
正如其他人所评论的那样,在单个查询中可能进行数百万次的大量计算将很慢,并且不会利用neo4j的优势.您应该研究修改数据模型和计算,以便可以利用关系和/或索引.
As others have commented, doing a significant calculation potentially millions of times in a single query is going to be slow, and does not take advantage of neo4j's strengths. You should investigate modifying your data model and calculation so that you can leverage relationships and/or indexes.
同时,您的第二个查询有很多建议:
In the meantime, there are a number of things to suggest with your second query:
Make sure you have created an index for
:Product(is_active)
, so that it is not necessary to scan all products. (By the way, if that property is actually supposed to be a boolean, then consider making it a boolean rather than a string.)
RETURN
子句应该不需要DISTINCT
运算符,因为所有结果行无论如何都应该是不同的.这是因为每个reco
值都已经不同.删除该关键字应该可以提高性能.
The RETURN
clause should not need the DISTINCT
operator, since all the result rows should be distinct anyway. This is because every reco
value is already distinct. Removing that keyword should improve performance.
这篇关于Neo4j推荐密码查询优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!