Neo4j 中的搜索查询:如何使用内部 TFIDF/levenshtein 或其他算法在 START 查询中对 Neo4j 中的结果进行排序? [英] Search queries in neo4j: how to sort results in neo4j in START query with internal TFIDF / levenshtein or other algorithms?

查看:14
本文介绍了Neo4j 中的搜索查询:如何使用内部 TFIDF/levenshtein 或其他算法在 START 查询中对 Neo4j 中的结果进行排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一个使用维基百科主题名称的模型,用于我在全文索引中的实验.

I am working on a model using wikipedia topics' names for my experiments in full-text index.

我在主题"(旧版)上设置和索引,并进行全文搜索:'united states':

I set up and index on 'topic' (legacy), and do a full text search for : 'united states':

start n=node:topic('name:(united states)') return n

第一个结果根本不相关:

The first results are not relevant at all:

'List of United States National Historic Landmarks in United States commonwealths and territories, associated states, and foreign states'

[...]

而真正的美国"被埋在名单的深处.

and the actual 'united states' is buried deep down the list.

因此,它提出了问题,为了在结果上找到最佳匹配(例如,levershtein、bi-gram 等算法),您首先必须获取所有匹配模式的项目.

As such, it raises the problem that, in order to find the best match (e.g. levershtein, bi-gram, and so on algorithms) on results, you first must fetch all the items matching the pattern.

这将是一个严重的限制,因为在这种情况下我有 21K 行,大约 4 秒.

That would be a serious constraint, cause just in this case I have 21K rows, ~4 seconds.

neo4j 使用哪些算法对全文搜索 (START) 的结果进行排序?它使用哪个基本原理对结果进行排序以及如何使用 cypher 更改它?在文档中编写使用 JAVA api 来应用 sort() - 有一个教程来指定要修改的文件以及在任何调整之前知道使用哪种排名原理会非常有用.

Which algorithms does neo4j use to order the results of a full-text search (START)? Which rationale does it use to sort result and how to change it using cypher? In the doc is written to use JAVA api to apply sort() - it would be very useful to have a tutorial for appointing to which files to modify and also to know which ranking rationale is used before any tweak.

根据以下评论进行编辑 - 结果的分页可能如下:n=node:topic('name:(united states)') return n skip 10 limit 50;

EDITED based on comments below - pagination of results is possible as: n=node:topic('name:(united states)') return n skip 10 limit 50;

(在限制之前跳过)但我需要确保第一个结果在分页之前有意义.

(skip before limit) but I need to ensure first results are meaningful before pagination.

推荐答案

我不知道 lucene 使用哪种排序算法对结果进行排序.但是,关于分页,如果您更改限制的顺序并跳过如下所示,应该没问题.<代码>start n=node:topic('name:(united states)') return n skip 10 limit 50 ;

I don't know which order algorithms does lucene use to order the results. However, about the pagination, if you change the order of limit and skip like follows, should be ok. start n=node:topic('name:(united states)') return n skip 10 limit 50 ;

我还要补充一点,如果您正在执行全文搜索,也许像 solr 这样的解决方案更合适.

I would also add that if you are performing full-text search maybe a solution like solr is more appropriate.

这篇关于Neo4j 中的搜索查询:如何使用内部 TFIDF/levenshtein 或其他算法在 START 查询中对 Neo4j 中的结果进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆