与鸟巢Elasticsearch文件词频 [英] term frequency of documents with Nest Elasticsearch

查看:232
本文介绍了与鸟巢Elasticsearch文件词频的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在elasticsearch新的,并希望得到使用巢elasticsearch特定文档的内容字段的前N项的频率。我已经搜查了很多,发现对我的作品一个合适的答案,但我刚刚得到的,因为它在计算整套文件的条款应该使用条款载体,而不是期限刻面。我知道我应该项向量像下面做一些设置;

I am new in elasticsearch and want to get the top N term frequency of the "content" field of a specific document using Nest elasticsearch. I've searched a lot to find a proper answer that works for me, but I just got that I should use Terms vector and not Term Facet since it counts the terms in the whole set of documents. I know that I should do some settings for Term Vector like below;

[ElasticProperty(Type = Nest.FieldType.attachment, TermVector =Nest.TermVectorOption.with_positions_offsets, Store = true)]
    public Attachment File { get; set; }



我搜索得到利用鸟巢Elasticsearch很多特定文档的词频但我发现大约Lucene的和Solr。
我需要鸟巢elasticsearch一个例子。我感谢您的帮助。

I searched for getting the term frequency of a specific document using Nest Elasticsearch a lot but all I found were about Lucene and Solr. I need an example in Nest elasticsearch. I appreciate your help.

还有一个问题;其实溶液(罗布建议的)效果很好,当我想要得到像我的文档的标题字符串的词频。但是,当我改变目标字段的文件的内容,我得到任何结果回来了!为了能够搜索文档的内容,我也跟着在这个环节了答案: ElasticSearch&放;附件类型(NEST C#)并能正常工作,我可以通过文件的内容,但得到它不工作的TF搜索词;下面是它的代码;

One more question; Actually the solution(suggested by Rob) works well when I want to get the Term frequency of a string like the title of my documents. But when I change the target Field to the Content of the documents, I gain no results back! in order to be able to search the content of documents, I followed the answer in this link: ElasticSearch & attachment type (NEST C#) and it works fine and I can search a term through the Content of the document but for getting the TF it does not work; Below is the code for it;

var searchResults = client.TermVector<Document>(t =>t.Id(ID).TermStatistics().Fields(f => f.File));    



有没有人有一个解决方案呢?

Does anyone have a solution for it?

推荐答案

您可以通过 client.TermVector(..)做。下面是一个简单的例子:

You can do this by client.TermVector(..). Here is a simple example:

文档类:

public class MyDocument
{
    public int Id { get; set; } 
    [ElasticProperty(TermVector = TermVectorOption.WithPositionsOffsets)]
    public string Description { get; set; }
    [ElasticProperty(Type = FieldType.Attachment, TermVector =TermVectorOption.WithPositionsOffsetsPayloads, Store = true, Index = FieldIndexOption.Analyzed)]
    public Attachment File { get; set; }
}



指一些测试数据:

Index some test data:

var indicesOperationResponse = client.CreateIndex(indexName, c => c
    .AddMapping<MyDocument>(m => m.MapFromAttributes()));

var myDocument = new MyDocument {Id = 1, Description = "test cat test"};
client.Index(myDocument);
client.Index(new MyDocument {Id = 2, Description = "river"});
client.Index(new MyDocument {Id = 3, Description = "test"});
client.Index(new MyDocument {Id = 4, Description = "river"});

client.Refresh();



通过NEST检索项统计:

Retrieve term statistics through NEST:

var termVectorResponse = client.TermVector<MyDocument>(t => t
    .Document(myDocument)
    //.Id(1) //you can specify document by id as well
    .TermStatistics()
    .Fields(f => f.Description));

foreach (var item in termVectorResponse.TermVectors)
{
    Console.WriteLine("Field: {0}", item.Key);

    var topTerms = item.Value.Terms.OrderByDescending(x => x.Value.TotalTermFrequency).Take(10);
    foreach (var term in topTerms)
    {
        Console.WriteLine("{0}: {1}", term.Key, term.Value.TermFrequency);
    }
}



输出:

Output:

Field: description
cat: 1
test: 2

希望它帮助。

更新

UPDATE

当我检查映射指数一件事很有趣:

When I checked mapping for index one thing was interesting:

{
    "my_index" : {
        "mappings" : {
            "mydocument" : {
                "properties" : {
                    "file" : {
                        "type" : "attachment",
                        "path" : "full",
                        "fields" : {
                            "file" : {
                                "type" : "string"
                            },
                            "author" : {
                                "type" : "string"
                            },
                            "title" : {
                                "type" : "string"
                            },
                            "name" : {
                                "type" : "string"
                            },
                            "date" : {
                                "type" : "date",
                                "format" : "dateOptionalTime"
                            },
                            "keywords" : {
                                "type" : "string"
                            },
                            "content_type" : {
                                "type" : "string"
                            },
                            "content_length" : {
                                "type" : "integer"
                            },
                            "language" : {
                                "type" : "string"
                            }
                        }
                    },
                    "id" : {
                        "type" : "integer"
                    }
                }
            }
        }
    }
}

有没有关于长期矢量信息

There is no information about term vector.

当我走过感创建的索引:

When I have created index through sense:

PUT http://localhost:9200/my_index/mydocument/_mapping
{
  "mydocument": {
    "properties": {
      "file": {
        "type": "attachment",
        "path": "full",
        "fields": {
          "file": {
            "type": "string",
            "term_vector":"with_positions_offsets",
            "store": true
          }
        }
      }
    }
  }
}

我能够检索词统计数据。

I was able to retrieve term statistics.

希望我会回来与合作,通过建立NEST映射。

Hope I'll be back later with working mapping created through NEST.

UPDATE2

UPDATE2

根据的格雷格的回答试试这个流利映射:

Based on Greg's answer try this fluent mapping:

var indicesOperationResponse = client.CreateIndex(indexName, c => c
        .AddMapping<MyDocument>(m => m
            .MapFromAttributes()
            .Properties(ps => ps
                .Attachment(s => s.Name(p => p.File)
                    .FileField(ff => ff.Name(f => f.File).TermVector(TermVectorOption.WithPositionsOffsets)))))
    );

这篇关于与鸟巢Elasticsearch文件词频的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆