neo4j中的搜索查询:如何使用内部TFIDF / levenshtein或其他算法在START查询中对neo4j中的结果进行排序? [英] Search queries in neo4j: how to sort results in neo4j in START query with internal TFIDF / levenshtein or other algorithms?

查看:392
本文介绍了neo4j中的搜索查询:如何使用内部TFIDF / levenshtein或其他算法在START查询中对neo4j中的结果进行排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我在'主题'(传统)中设置并编制索引,并且使用维基百科主题的名称进行全文索引实验。 ,并做一个全文搜索:'united states'

  start n = node:topic('name:(united states)')return n 

结果根本不相关:

 '美国国家历史地标列表在美国联邦和领土,相关国家和外国'

[...]
<而实际的'美国'则深深地埋在了名单之下。



因此,它引发了问题,以便找到匹配结果的最佳匹配(例如levershtein,bi-gram等算法),首先必须获取匹配该模式的所有项目。



这将是一个严重的限制,因为在这种情况下,我有21K行,约4秒。



neo4j使用哪些算法来排序全文搜索(START)的结果?
它使用哪种基本原理对结果进行排序,以及如何使用密码来改变结果?
在使用JAVA api编写文档时,应用sort() - 有一个教程用于指定要修改哪些文件以及在进行任何调整之前知道使用哪个排名基本原理将非常有用。 p>

根据下面的评论进行编辑 - 结果分页可能为:
n = node:topic('name:(united states)')return n skip 10 limit 50 ;



(跳过限制之前)但我需要确保第一个结果在分页之前是有意义的

解决方案

我不知道lucene使用哪种顺序算法来排序结果。
然而,关于分页,如果你改变限制的顺序并跳过如下,应该没问题。

start n = node:topic('name:(united states)')return n skip 10 limit 50;



我还会补充一点,如果您正在执行全文搜索,也许像solr这样的解决方案更合适。

I am working on a model using wikipedia topics' names for my experiments in full-text index.

I set up and index on 'topic' (legacy), and do a full text search for : 'united states':

start n=node:topic('name:(united states)') return n

The first results are not relevant at all:

'List of United States National Historic Landmarks in United States commonwealths and territories, associated states, and foreign states'

[...]

and the actual 'united states' is buried deep down the list.

As such, it raises the problem that, in order to find the best match (e.g. levershtein, bi-gram, and so on algorithms) on results, you first must fetch all the items matching the pattern.

That would be a serious constraint, cause just in this case I have 21K rows, ~4 seconds.

Which algorithms does neo4j use to order the results of a full-text search (START)? Which rationale does it use to sort result and how to change it using cypher? In the doc is written to use JAVA api to apply sort() - it would be very useful to have a tutorial for appointing to which files to modify and also to know which ranking rationale is used before any tweak.

EDITED based on comments below - pagination of results is possible as: n=node:topic('name:(united states)') return n skip 10 limit 50;

(skip before limit) but I need to ensure first results are meaningful before pagination.

解决方案

I don't know which order algorithms does lucene use to order the results. However, about the pagination, if you change the order of limit and skip like follows, should be ok. start n=node:topic('name:(united states)') return n skip 10 limit 50 ;

I would also add that if you are performing full-text search maybe a solution like solr is more appropriate.

这篇关于neo4j中的搜索查询:如何使用内部TFIDF / levenshtein或其他算法在START查询中对neo4j中的结果进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆