加入Lucene [英] JOINS in Lucene

查看:110
本文介绍了加入Lucene的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有什么方法可以在Lucene中实现JOINS吗?

Is there any way to implement JOINS in Lucene?

推荐答案

您可以手动进行通用联接-运行两次搜索,获取所有结果(而不是前N个), 在连接键上对它们进行排序,并与两个有序列表相交.但这会极大地打击您的堆(如果列表甚至适合其中).

You can do a generic join by hand - run two searches, get all results (instead of top N), sort them on your join key and intersect two ordered lists. But that's gonna thrash your heap real hard (if the lists even fit in it).

可以进行优化,但是要在非常特定的条件下进行.
IE. -您进行自联接,并且仅使用(随机访问)Filters进行过滤,而不使用Queries.然后,您可以在两个连接字段上手动(并行)迭代术语,将每个术语的docId列表相交,过滤它们-这就是您的连接.

There are possible optimizations, but under very specific conditions.
I.e. - you do a self-join, and only use (random access) Filters for filtering, no Queries. Then you can manually iterate terms on your two join fields (in parallel), intersect docId lists for each term, filter them - and here's your join.

有一种方法可以处理简单的父子关系的流行用例,每个文档的子项数量相对较少-

There's an approach handling a popular use-case of simple parent-child relationships with relatively small numer of children per-document - https://issues.apache.org/jira/browse/LUCENE-2454
Unlike the flattening method mentioned by @ntziolis, this approach correctly handles cases like: have a number of resumes, each with multiple work_experience children, and try finding someone who worked at company NNN in year YYY. If simply flattened, you'll get back resumes for people that worked for NNN in any year & worked somewhere in year YYY.

处理简单的父子案例的另一种方法是确实将文档弄平,但要确保不同子项的值之间有一个较大的posIncrement差距,然后使用SpanNear查询来防止多个子查询在子项之间匹配.对此,LinkedIn已有几年历史,但我找不到它.

An alternative for handling simple parent-child cases is to flatten your doc, indeed, but ensure values for different children are separated by a big posIncrement gap, and then use SpanNear query to prevent your several subqueries from matching across children. There was a few-years old LinkedIn presentation about this, but I failed to find it.

这篇关于加入Lucene的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆