RavenDB中的子字符串搜索 [英] Substring search in RavenDB

查看:51
本文介绍了RavenDB中的子字符串搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组类型为 Idea

public class Idea
{
    public string Title { get; set; }
    public string Body { get; set; }
}

我想按子字符串搜索此对象.例如,当我有标题为" idea "的对象时,我希望在输入" idea "的任何子字符串时都能找到它: i,id,ide,想法,d,de,dea,e,ea,a .

I want to search this objects by substring. For example when I have object of title "idea", I want it to be found when I enter any substring of "idea": i, id, ide, idea, d, de, dea, e, ea, a.

我正在使用RavenDB来存储数据.搜索查询如下所示:

I'm using RavenDB for storing data. The search query looks like that:

var ideas = session
              .Query<IdeaByBodyOrTitle.IdeaSearchResult, IdeaByBodyOrTitle>()
              .Where(x => x.Query.Contains(query))
              .As<Idea>()
              .ToList();

当索引在下面时:

public class IdeaByBodyOrTitle : AbstractIndexCreationTask<Idea, IdeaByBodyOrTitle.IdeaSearchResult>
{
    public class IdeaSearchResult
    {
        public string Query;
        public Idea Idea;
    }

    public IdeaByBodyOrTitle()
    {
        Map = ideas => from idea in ideas
                       select new
                           {
                               Query = new object[] { idea.Title.SplitSubstrings().Concat(idea.Body.SplitSubstrings()).Distinct().ToArray() },
                               idea
                           };
        Indexes.Add(x => x.Query, FieldIndexing.Analyzed);
    }
}

SplitSubstrings()是一种扩展方法,它返回给定字符串的所有不同子字符串:

SplitSubstrings() is an extension method which returns all distinct substrings of given string:

static class StringExtensions
{
    public static string[] SplitSubstrings(this string s)
    {
        s = s ?? string.Empty;
        List<string> substrings = new List<string>();
        for (int i = 0; i < s.Length; i++)
        {                
            for (int j = 1; j <= s.Length - i; j++)
            {
                substrings.Add(s.Substring(i, j));
            }
        }            
        return substrings.Select(x => x.Trim()).Where(x => !string.IsNullOrEmpty(x)).Distinct().ToArray();
    }
}

这不起作用.特别是因为RavenDB无法识别 SplitSubstrings()方法,因为它在我的自定义程序集中.如何使这项工作,基本上如何迫使RavenDB识别这种方法?除此之外,我的方法是否适合这种搜索(按子字符串搜索)?

This is not working. Particularly because RavenDB is not recognizing SplitSubstrings() method, because it is in my custom assembly. How to make this work, basically how to force RavenDB to recognize this method ? Besides that, is my approach appropriate for this kind of searching (searching by substring) ?

编辑

基本上,我想在此搜索上建立自动完成功能,因此需要快速.

Basically, I want to build auto-complete feature on this search, so it need to be fast.

顺便说一句:我正在使用RavenDB-Build#960

Btw: I'm using RavenDB - Build #960

推荐答案

您可以使用以下方法跨多个字段执行子字符串搜索:

You can perform substring search across multiple fields using following approach:

(1)

public class IdeaByBodyOrTitle : AbstractIndexCreationTask<Idea>
{
    public IdeaByBodyOrTitle()
    {
        Map = ideas => from idea in ideas
                       select new
                           {
                               idea.Title,
                               idea.Body
                           };
    }
}

此站点上,您可以检查:

默认情况下,RavenDB使用一个名为的自定义分析器LowerCaseKeywordAnalyzer用于所有内容.(...)的默认值每个字段分别为Stores中的FieldStorage.No和StoreIndex中的FieldIndexing.Default索引."

"By default, RavenDB uses a custom analyzer called LowerCaseKeywordAnalyzer for all content. (...) The default values for each field are FieldStorage.No in Stores and FieldIndexing.Default in Indexes."

因此,默认情况下,如果您在raven客户程序中检查索引项,它将显示以下内容:

So by default, if you check the index terms inside the raven client, it looks following:

Title                    Body
------------------       -----------------
"the idea title 1"       "the idea body 1"
"the idea title 2"       "the idea body 2" 

基于此,可以构造通配符查询:

Based on that, wildcard query can be constructed:

var wildquery = string.Format("*{0}*", QueryParser.Escape(query));

然后与 .In .Where 构造一起使用(在内部使用OR运算符):

which is then used with the .In and .Where constructions (using OR operator inside):

var ideas = session.Query<User, UsersByDistinctiveMarks>()
                   .Where(x => x.Title.In(wildquery) || x.Body.In(wildquery));

(2)

或者,您可以使用纯lucene查询:

Alternatively, you can use pure lucene query:

var ideas = session.Advanced.LuceneQuery<Idea, IdeaByBodyOrTitle>()
                   .Where("(Title:" + wildquery + " OR Body:" + wildquery + ")");

(3)

您也可以使用 .Search 表达式,但是如果要跨多个字段搜索,则必须以不同的方式构造索引:

You can also use .Search expression, but you have to construct your index differently if you want to search across multiple fields:

public class IdeaByBodyOrTitle : AbstractIndexCreationTask<Idea, IdeaByBodyOrTitle.IdeaSearchResult>
{
    public class IdeaSearchResult
    {
        public string Query;
        public Idea Idea;
    }

    public IdeaByBodyOrTitle()
    {
        Map = ideas => from idea in ideas
                       select new
                           {
                               Query = new object[] { idea.Title, idea.Body },
                               idea
                           };
    }
}

var result = session.Query<IdeaByBodyOrTitle.IdeaSearchResult, IdeaByBodyOrTitle>()
                    .Search(x => x.Query, wildquery, 
                            escapeQueryOptions: EscapeQueryOptions.AllowAllWildcards,
                            options: SearchOptions.And)
                    .As<Idea>();

摘要:

还要记住, * term * 相当昂贵,尤其是前导通配符.在此帖子中,您可以找到有关它的更多信息.据说,通配符前导会迫使lucene对索引进行全面扫描,因此会大大降低查询性能.Lucene在内部存储按字母顺序排序的索引(实际上是字符串字段的术语),并从左到右读取".这就是为什么快速搜索尾部通配符而搜索慢的通配符的原因.

Also have in mind that *term* is rather expensive, especially the leading wildcard. In this post you can find more info about it. There is said, that leading wildcard forces lucene to do a full scan on the index and thus can drastically slow down query-performance. Lucene internally stores its indexes (actually the terms of string-fields) sorted alphabetically and "reads" from left to right. That’s the reason why it is fast to do a search for a trailing wildcard and slow for a leading one.

因此也可以使用 x.Title.StartsWith("something"),但这显然不会搜索所有子字符串.如果需要快速搜索,可以更改要分析的字段的索引"选项,但不会再次搜索所有子字符串.

So alternatively x.Title.StartsWith("something") can be used, but this obviously do not search across all substrings. If you need fast search, you can change the Index option for the fields you want to search on to be Analyzed but it again will not search across all substrings.

如果子字符串查询中有一个 空格键 ,请选中此 http://architects.dzone.com/articles/how-do-suggestions-ravendb .

If there is a spacebar inside of the substring query, please check this question for possible solution. For making suggestions check http://architects.dzone.com/articles/how-do-suggestions-ravendb.

这篇关于RavenDB中的子字符串搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆