使用solr进行离线聚类? [英] Off-line clustering using solr?

查看：42 发布时间：2021/12/30 8:52:15 search solr cluster-analysis mahout carrot2

本文介绍了使用solr进行离线聚类?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在 solr 中聚集我的索引数据.每个 solr 文档都包含以下字段:id、title、url.

I want to cluster my indexed data in solr. Each solr document contains the following fields : id, title, url.

我已经阅读了 solr 7.7 文档，那里提到的聚类算法仅适用于每个查询的搜索结果.而我需要的是基于文档标题的完整索引聚类.

I have read solr 7.7 docs and the clustering algorithm mentioned there is applied only to the search result of each single query. And my need is a full index clustering based on the document title.

有人可以帮忙吗?

推荐答案

据我所知，没有用于聚集整个 Solr 索引的开箱即用插件.

As far as I'm aware, there's no out-of-the-box plugin for clustering the whole Solr index.

如果你有一些机器学习的背景，看看Apache Mahout，应该很合适用于对这种大小的数据集进行聚类.或者，我们开发了一个商业许可的 Carrot2 衍生产品，名为 Lingo4G，专为集群大型集合而设计的文本.但是，在这两种情况下，都没有与 Solr 直接集成——您需要自己处理集成.

If you have some background in machine learning, have a look at Apache Mahout, it should be suitable for clustering a dataset of this size. Alternatively, there's a commercially-licensed Carrot2 spin-off we develop called Lingo4G, which is designed for clustering large collections of text. In both cases, however, there is no direct integration with Solr -- you'd need to handle the integration on your own.

这篇关于使用solr进行离线聚类?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用solr进行离线聚类? [英] Off-line clustering using solr?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用solr进行离线聚类? [英] Off-line clustering using solr?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭