使用solr进行离线聚类? [英] Off-line clustering using solr?

查看:42
本文介绍了使用solr进行离线聚类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在 solr 中聚集我的索引数据.每个 solr 文档都包含以下字段:id、title、url.

I want to cluster my indexed data in solr. Each solr document contains the following fields : id, title, url.

我已经阅读了 solr 7.7 文档,那里提到的聚类算法仅适用于每个查询的搜索结果.而我需要的是基于文档标题的完整索引聚类.

I have read solr 7.7 docs and the clustering algorithm mentioned there is applied only to the search result of each single query. And my need is a full index clustering based on the document title.

有人可以帮忙吗?

推荐答案

据我所知,没有用于聚集整个 Solr 索引的开箱即用插件.

As far as I'm aware, there's no out-of-the-box plugin for clustering the whole Solr index.

如果你有一些机器学习的背景,看看Apache Mahout,应该很合适用于对这种大小的数据集进行聚类.或者,我们开发了一个商业许可的 Carrot2 衍生产品,名为 Lingo4G,专为集群大型集合而设计的文本.但是,在这两种情况下,都没有与 Solr 直接集成——您需要自己处理集成.

If you have some background in machine learning, have a look at Apache Mahout, it should be suitable for clustering a dataset of this size. Alternatively, there's a commercially-licensed Carrot2 spin-off we develop called Lingo4G, which is designed for clustering large collections of text. In both cases, however, there is no direct integration with Solr -- you'd need to handle the integration on your own.

这篇关于使用solr进行离线聚类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆