从ElasticSearch文档中检索分析的令牌 [英] Retrieve analyzed tokens from ElasticSearch documents

查看:129
本文介绍了从ElasticSearch文档中检索分析的令牌的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我知道您可以使用 Analyze API 根据您的分析模块分析任意文本。所以我可以将我的文档中的数据复制并粘贴到Analyze API中,看看它是如何被标记的。



虽然这似乎不必要地耗费时间。有没有办法指示ElasticSearch在搜索结果中返回标记化文本?我看过文档,没有找到任何东西。

解决方案

看看另一个答案: elasticsearch - 返回字段的标记。不幸的是,它需要使用提供的脚本即时重新分析现场的内容。

应该可以编写一个插件来显示此功能。这个想法是添加两个端点:




  • 允许读取lucene TermsEnum像solr TermsComponent 对自动建议也很有用。请注意,它不会是每个文档,只是索引上每个术语的频率和文档的频率(可能是昂贵的,有很多独特的术语)

  • 允许读取术语向量如果启用,就像solr TermVectorComponent 一样。这将是每个文档,但需要存储术语向量(您可以在映射中进行配置),并允许在启用时检索位置和偏移量。


Trying to access the analyzed/tokenized text in my ElasticSearch documents.

I know you can use the Analyze API to analyze arbitrary text according your analysis modules. So I could copy and paste data from my documents into the Analyze API to see how it was tokenized.

This seems unnecessarily time consuming, though. Is there any way to instruct ElasticSearch to returned the tokenized text in search results? I've looked through the docs and haven't found anything.

解决方案

Have a look at this other answer: elasticsearch - Return the tokens of a field. Unfortunately it requires to reanalyze on the fly the content of your field using the script provided.
It should be possible to write a plugin to expose this feature. The idea would be to add two endpoints to:

  • allow to read the lucene TermsEnum like the solr TermsComponent does, useful to make auto-suggestions too. Note that it wouldn't be per document, just every term on the index with term frequency and document frequency (potentially expensive with a lot of unique terms)
  • allow to read the term vectors if enabled, like the solr TermVectorComponent does. This would be per document but requires to store the term vectors (you can configure it in your mapping) and allows also to retrieve positions and offsets if enabled.

这篇关于从ElasticSearch文档中检索分析的令牌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆