ElasticSearch中的用户定义的术语向量 [英] User defined termvectors in ElasticSearch

查看:236
本文介绍了ElasticSearch中的用户定义的术语向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何(如果可能的话)在ElasticSearch索引中插入任何项向量?

How (if at all possible) can one insert any term-vector in an ElasticSearch index?

ES在后台计算术语向量以执行其文本挖掘任务,但是能够代替地输入(术语,权重)对的任何列表将很有用.

ES computes term-vectors, behind the scenes, in order to carry out it's text mining tasks, but it would be useful to be able to enter any list of (term, weight) pairs instead.

为什么?

例如,尽管ES启用了k = 2的kNN(k最近邻),但在地理邻近的情况下,它没有任何明确的k> 2功能.如果我们能够插入自己的术语向量,则可以利用ES内置的文本索引方法来破解k> 2的功能.

Well, for instance, though ES enables kNN (k-nearest-neighbors) for k=2, in the context of geographic proximity, it doesn't have any explicit k>2 functionality. If we were able to insert our own term-vectors, we could hack a k>2 functionality by harnessing ES's built in text-indexing methods.

关于此问题的任何迹象吗?

Any indications on this issue?

推荐答案

据我所知,弹性搜索无法做到这一点(我仍在寻找最快的KNN实时搜索方法,弹性搜索是其中之一我的选择).

As far as I know, there's no way to do that by elasticsearch (I'm still looking for the fastest KNN real time search approach, elasticsearch is one of my choices).

Elasticsearch基于反向索引,因此术语向量中的每个术语(可能来自句子)都将在排序列表中建立索引.当我们搜索查询时,该查询将被分析成一个词向量,而elasticsearch(实际上是Lucene)将搜索每个词的索引.

Elasticsearch is based on inverted index, so each term in the term vector (which may comes from a sentence) will be indexed in a sorted list. When we're searching a query, the query will be analyzed into a term vector and elasticsearch (lucene actually) will search the indices for each term.

但是KNN要求计算两个向量之间的距离,即使它们不共享相同的项,传统的倒排索引也不是为此要求而设计的.

But KNN requires calculating the distance between two vectors even they don't share the same term, the traditional inverted index is not designed for this requirement.

正如您所说,elasticsearch可以通过地理查询在k = 2时实现实时KNN搜索,但我认为它不能支持k> 2.

As you have said, elasticsearch could implement the real time KNN search when k = 2 by geo query, but I don't think it could support k > 2.

顺便说一句,如果您发现任何可以帮助实现实时KNN搜索的方法,即K可能是一个非常大的数字(100000?),并且在一个庞大的数据集(向量的数量)上,请告诉我,thx :)

By the way, if you have found any approach that could help implement real time KNN search that K may be a very large number ( 100000 ?) and on a huge data set (number of vectors), please tell me, thx :)

这篇关于ElasticSearch中的用户定义的术语向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆