加入 Lucene [英] JOINS in Lucene

查看:18
本文介绍了加入 Lucene的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法在 Lucene 中实现 JOINS?

Is there any way to implement JOINS in Lucene?

推荐答案

您可以手动进行通用连接 - 运行两次搜索,获取所有结果(而不是前 N 个),在您的连接键上对它们进行排序并与两个有序列表相交.但这会让你的堆变得非常困难(如果列表甚至适合它).

You can do a generic join by hand - run two searches, get all results (instead of top N), sort them on your join key and intersect two ordered lists. But that's gonna thrash your heap real hard (if the lists even fit in it).

有可能的优化,但在非常特定的条件下.
IE.- 你做一个自加入,并且只使用(随机访问)Filters 进行过滤,没有Queries.然后,您可以手动迭代两个连接字段上的术语(并行),将每个术语的 docId 列表相交,过滤它们 - 这就是您的连接.

There are possible optimizations, but under very specific conditions.
I.e. - you do a self-join, and only use (random access) Filters for filtering, no Queries. Then you can manually iterate terms on your two join fields (in parallel), intersect docId lists for each term, filter them - and here's your join.

有一种方法可以处理简单的父子关系的流行用例,每个文档的子节点数量相对较少 - https://issues.apache.org/jira/browse/LUCENE-2454
与@ntziolis 提到的扁平化方法不同,这种方法可以正确处理以下情况:有许多简历,每个简历都有多个 work_experience 孩子,并尝试找到 YYY 年在 NNN 公司工作的人.如果只是扁平化,您将获得在任何一年为 NNN 工作的人的简历 &在 YYY 年某处工作.

There's an approach handling a popular use-case of simple parent-child relationships with relatively small numer of children per-document - https://issues.apache.org/jira/browse/LUCENE-2454
Unlike the flattening method mentioned by @ntziolis, this approach correctly handles cases like: have a number of resumes, each with multiple work_experience children, and try finding someone who worked at company NNN in year YYY. If simply flattened, you'll get back resumes for people that worked for NNN in any year & worked somewhere in year YYY.

处理简单父子案例的另一种方法是扁平化你的文档,但确保不同子级的值由一个大的 posIncrement 间隙隔开,然后使用 SpanNear 查询来防止你的几个来自跨孩子匹配的子查询.有几年前的 LinkedIn 演示文稿对此进行了介绍,但我没有找到它.

An alternative for handling simple parent-child cases is to flatten your doc, indeed, but ensure values for different children are separated by a big posIncrement gap, and then use SpanNear query to prevent your several subqueries from matching across children. There was a few-years old LinkedIn presentation about this, but I failed to find it.

这篇关于加入 Lucene的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆