分析或未分析,选择什么 [英] analyzed or not_analyzed, what to choose

查看:161
本文介绍了分析或未分析,选择什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只使用kibana来搜索ElasticSearch,我有几个字段只能取几个值(最坏的情况,servername,30个不同的值)。



我明白什么分析对更大,更复杂的领域像这样,但是小而简单的我不了解anaylyzed / not_analyzed字段的进步/缺点。



那么对于有限的一组值字段使用analyze和not_analyzed有什么好处(例如servername:server [0-9] *,没有特殊字符打破)?我会在kibana中输入什么样的搜索类型?我会获得任何搜索速度或磁盘空间吗?



在其中一个测试中,我看到该字段的.raw版本现在是空的,但是kibana仍将该字段标记为分析,所以我发现我的测试不确定。

解决方案

我会尽量保持简单,如果你需要更多的澄清,我知道,我会详细说明一个更好的答案。



已分析字段将使用您为您的特定表定义的分析器创建一个令牌映射。如果您使用默认分析器(您所指的是没有特殊字符的内容,请说服务器[1-9])使用默认分析器(alnum-lowercase word-braker(这不是它基本上所做的名称))去标记:

这个 - > HelloWorld123
into - > token1:helloworld123

  OR 

这个 - > Hello World 123
into - > token1:hello&&令牌2:世界&& token3:123

在这种情况下,如果您进行搜索:HeLlO将成为 - >hello它将匹配此文档,因为令牌hello在那里。



在not_analized字段的情况下,它不会应用任何标记器,您的​​令牌是您的关键字这就是说:

  this  - > Hello World 123 
into - > token1:(Hello World 123)

如果您搜索hello world 123的字段



不会匹配,因为区分大小写(你仍然可以使用通配符(Hello *),让我们在另一个时间再来)。



简而言之:



对要搜索的字段使用已分析字段,并希望弹性搜索对其进行评分。示例:包含单词jobs的标题。查询:title:jobs。

  doc1:标题:montreal中的开发人员工作
doc2:title:java coder工作在vancuver
doc3:标题:unix设计师工作在多伦多
doc4:标题:数据库管理员空缺在蒙特利尔

这将要检索title1 title2 title3。



在这些情况下,分析字段是您想要的。



如果您事先知道什么样的数据将在该领域,您将要查询您想要的内容,然后not_analyzed是您想要的。



示例:



从server123获取所有日志。



查询:server:server123。



文件1:server:server123,log:randomstring,date:01-jan
doc2:server:server986,log:randomstring,date:01-jan
doc3:server:server777,log:randomstring,date:01-jan
doc4:server:server666,log:randomstring,date:01-jan
doc5:server:server123,log:randomstring,date: 02-jan

仅来自server1和server5。


$ b $我希望你能得到一点。正如我所说,保持简单是关于你需要的。



分析 - >磁盘上有更多的空间(如果分析文件大,则更多)。分析 - >更多的索引时间。分析 - >更好地匹配文档。



not_analyzed - >磁盘上的空间更少。 not_analyzed - >更少的索引时间。 not_analyzed - >字段的完全匹配或使用通配符。



问候,



Daniel


I'm using only kibana to search ElasticSearch and i have several fields that can only take a few values (worst case, servername, 30 different values).

I do understand what analyze do to bigger, more complex fields like this, but the small and simple ones i fail to understand the advance/disadvantage of anaylyzed/not_analyzed fields.

So what are the benefits of using analyzed and not_analyzed for a "limited set of values" field (example. servername: server[0-9]* , no special characters to break)? What kind of search types will i lose in kibana? Will i gain any search speed or disk space?

Testing on one of then i saw that the .raw version of the field is now empty but kibana still flags the field as analyzed, so i find my tests inconclusive.

解决方案

I will to try to keep it simple, if you need more clarification just let me know and I'll elaborate a better answer.

the "analyzed" field is going to create a token using the analyzer that you had defined for that specific table in your mapping. if you are using the default analyzer (as you refer to something without especial characters lets say server[1-9]) using the default analyzer (alnum-lowercase word-braker(this is not the name just what it does basically)) is going to tokenize :
this -> HelloWorld123 into -> token1:helloworld123

OR

this -> Hello World 123
into -> token1:hello && token2:world && token3:123

in this case if you do a search: HeLlO it will become -> "hello" and it will match this document because the token "hello" is there.

in the case of not_analized fields it doesnt apply any tokenizer at all, your token is your keyword so that being said:

this -> Hello World 123
into -> token1:(Hello World 123)

if you search that field for "hello world 123"

is not going to match because is "case sensitive" (you can still use wildcards though (Hello*), lets address that in another time).

in a nutshell:

use "analyzed" fields for fields that you are going to search and you want elasticsearch to score them. example: titles that contain the word "jobs". query:"title:jobs".

doc1 : title:developer jobs in montreal
doc2 : title:java coder jobs in vancuver
doc3 : title:unix designer jobs in toronto
doc4 : title:database manager vacancies in montreal

this is going to retrieve title1 title2 title3.

in those case "analyzed" fields is what you want.

if you know in advance what kind of data would be on that field and you're going to query exactly what you want then "not_analyzed" is what you want.

example:

get all the logs from server123.

query:"server:server123".

doc1 :server:server123,log:randomstring,date:01-jan
doc2 :server:server986,log:randomstring,date:01-jan
doc3 :server:server777,log:randomstring,date:01-jan
doc4 :server:server666,log:randomstring,date:01-jan
doc5 :server:server123,log:randomstring,date:02-jan

results only from server1 and server5.

and well i hope you get the point. as i said keep it simple is about what you need.

analyzed -> more space on disk (LOT MORE if the analyze filds are big). analyzed -> more time for indexation. analyzed -> better for matching documents.

not_analyzed -> less space on disk. not_analyzed -> less time for indexation. not_analyzed -> exact match for fields or using wildcards.

Regards,

Daniel

这篇关于分析或未分析,选择什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆