模式匹配在弹性搜索? [英] pattern matching in elastic search?

查看:96
本文介绍了模式匹配在弹性搜索?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

继续从我以前的 post ,我已经根据femtoRgon的帖子弹性搜索不支持某些字符和锚点。



我正在寻找匹配xxx-xx-xxxx模式的方式,以便查找对于使用弹性搜索的社会保险号码的文档。



假设在索引文档中,我想找到所有那些具有符合xxx-xx-xxxx模式的社会安全号码的文档。



索引文档的示例代码:

  InputStream is = null ; 
try {
is = new FileInputStream(/ home / admin / Downloads / 20121221.doc);
ContentHandler contenthandler = new BodyContentHandler();
元数据元数据=新元数据();
解析器解析器=新AutoDetectParser();
parser.parse(is,contenthandler,metadata,new ParseContext());
}
catch(异常e){
e.printStackTrace();
}
finally {
if(is!= null)is.close();
}

搜索示例代码



QueryBuilder queryBuilderFullText = null;

  
queryBuilderFullText = QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(),
FilterBuilders.regexpFilter(_ all,[0-9] {3}?[0-9] {2}?[0- 9] {4}));
SearchRequestBuilder requestBuilder;
requestBuilder = client.prepareSearch()
.setIndices(getDomainIndexId(project))
.setTypes(getProjectTypeId(project))
.setQuery(queryBuilderFullText);
SearchResponse response = requestBuilder.execute()。actionGet(ES_TIMEOUT_MS);
SearchHits hits = response.getHits();
if(hits.getTotalHits()> 0){
System.out.println(hits.getTotalHits());
} else {
return 0l;
}

我正在获得以下命中:

  45-555-5462 
457-55-5462
4578-55-5462
457-55-54623
457-55-5462-23

但是根据我的要求,它只能返回457- 55-5462(根据模式匹配xxx-xx-xxxx)。



请帮助。

解决方案

> ^ , $ \d 不能使用,我会做这个:

  [^ 0-9  - ] [0-9] {3}  -  [0-9] {2} -  [0-9] {4} [^ 0-9-] 

或在Java中: / p>

  FilterBuilders.regexpFilter(_ all,[^ 0-9  - ] [0-9] {3}  -  [0 -9] {2}  -  [0-9] {4} [^ 0-9-]));哪些检查发现号码之前或之后没有其他号码或破折号?它确实要求在比赛之前和之后有一些角色,所以这不会捕获具有非常开始或具有社会保障号的文档结束



Regex101演示


Continuing from my earlier post, I have changed the query as according to femtoRgon's post some characters and anchors are not supported by elastic search.

I am looking the way to match the pattern like "xxx-xx-xxxx" in order to look for documents with social security numbers using elastic search.

Let’s suppose, in indexed documents, I would like to find all those documents that has social security numbers that matches "xxx-xx-xxxx" pattern.

Sample code for indexing the document:

InputStream is = null;
    try {
      is = new FileInputStream("/home/admin/Downloads/20121221.doc");
      ContentHandler contenthandler = new BodyContentHandler();
      Metadata metadata = new Metadata();
      Parser parser = new AutoDetectParser();
      parser.parse(is, contenthandler, metadata, new ParseContext());
      }
    catch (Exception e) {
      e.printStackTrace();
    }
    finally {
        if (is != null) is.close();
    } 

Sample Code for searching

QueryBuilder queryBuilderFullText = null;
queryBuilderFullText = QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(),
                        FilterBuilders.regexpFilter("_all", "[0-9]{3}?[0-9]{2}?[0-9]{4}"));
SearchRequestBuilder requestBuilder;
            requestBuilder = client.prepareSearch()
                    .setIndices(getDomainIndexId(project))
                    .setTypes(getProjectTypeId(project))
                    .setQuery(queryBuilderFullText);
SearchResponse response = requestBuilder.execute().actionGet(ES_TIMEOUT_MS);
            SearchHits hits = response.getHits();
if (hits.getTotalHits() > 0) {
System.out.println(hits.getTotalHits());
 } else {
                return 0l;  
        }

I am getting hits for following:

45-555-5462
457-55-5462
4578-55-5462
457-55-54623
457-55-5462-23

But as per my requirement, it should only return "457-55-5462" (based on pattern matching "xxx-xx-xxxx").

Please help.

解决方案

Seeing as ^, $ and \d can't be used, I would do this:

[^0-9-][0-9]{3}-[0-9]{2}-[0-9]{4}[^0-9-]

Or in Java:

FilterBuilders.regexpFilter("_all", "[^0-9-][0-9]{3}-[0-9]{2}-[0-9]{4}[^0-9-]"));

Which checks that before or after the found number are no other numbers or dashes. It does require there be some character before and after the match though, so this won't capture documents that have the social security number as the very beginning or very end.

Regex101 demo

这篇关于模式匹配在弹性搜索?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆