Solr 驱动的标签云 [英] Solr powered Tag Cloud

查看:21
本文介绍了Solr 驱动的标签云的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我似乎被 Solr 分面驱动的标签云逻辑所困.首先,我使用 OpenNLP 来解析我的文档并从中获取相关单词,因此每个文档都被拆分为 n 个单词.这基本上是我的 Solr 响应的样子:

I seem to be stuck behind the logic of a Solr faceting-powered tag cloud. First of all, I'm using OpenNLP to parse my docs and obtain relevant words out of it, so every single document gets split into n number of words. And here's basically what my Solr response looks like:

<docID>
<title>My Doc Title</title>
<content>My Doc Title</content>
<date_published>My Doc Title</date_published>
</docID>

我相信一定有办法整合这里的文字.我首先想到的是这样的:

I believe there must be a way to integrate the words in here. I first thought of something like this:

<docID>
<title>My Doc Title</title>
<content>My Doc Title</content>
<date_published>My Doc Title</date_published>
<words>word</words>
<words1>word1</words1>
<words2>word2</words2>
<words3>word3</words3>
<wordsN>wordN</wordsN>
</docID>

但是分面是不可能的,因为我不知道每个 docID 会得到多少个单词字段,然后分面必须跨字段完成(我什至不确定这可能).我正在尝试寻找可能的答案,但我似乎被卡住了……最后,我需要对 n 个单词进行分面,以获取索引中的每个文档.非常感谢您的想法.

But the faceting wouldn't be possible, as i have no idea how many words fields i would get per docID, then the faceting would have to be done across fields (which i;m not even sure it;s possible). I am trying to look into possible answers but I seem to be stuck... at the end, i need to make a faceting of n words that would get each single doc I have in my index. Thoughts would highly be appreciated.

推荐答案

我建议使用多值的单个词字段并存储每个文档的词列表.

I would suggest using a single words field that is multivalued and stores the list of words per document.

拥有无限数量的单词d+ 字段会使事情复杂化.

having unbound number of wordd+ fields will complicate things.

如果您使用单个单词多值字段,您可以获得所有单词及其频率,这应该足以创建标签云.

if you use a single words multivalued field you can get all the words along with their frequencies which should be enough for creating the tag cloud.

这篇关于Solr 驱动的标签云的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆