简单等级算法来排序结果根据用户查询 [英] Simple rating algorithm to sorting results according to user query

查看:273
本文介绍了简单等级算法来排序结果根据用户查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开发了几个部分组成一个非常基本的网络搜索引擎。根据用户查询检索结果后,我要计算每个结果的速率,然后排序计算率的结果。下面是我的查询:

I'm developing a very basic web search engine that has several parts. After retrieving results according to a user query, I want to calculate rate of each result and then sort results by calculated rate. Here is my query:

var tmpQuery = (from urls in _context.Urls
                join documents in _context.Documents
                  on urls.UrlId equals documents.DocumentId
                let words = (from words in _context.Words
                             join hits in _context.Hits
                               on words.WordId equals hits.WordId
                             where hits.DocumentId == documents.DocumentId
                             select words.Text)
                select new { urls, documents, words });

var results = (from r in tmpQuery.AsEnumerable()
               where r.urls.ResolvedPath.Contains(breakedQuery, KeywordParts.Url, part) ||
                     r.documents.Title.Contains(breakedQuery, KeywordParts.Title, part) ||
                     r.documents.Keywords.Contains(breakedQuery, KeywordParts.Keywords, part) ||
                     r.documents.Description.Contains(breakedQuery, Description, part) ||
                     r.words.Contains(breakedQuery, KeywordParts.Content, part)

                     select new SearchResult()
                     {
                        UrlId = r.urls.UrlId,
                        Url = r.urls.ResolvedPath,
                        IndexedOn = r.documents.IndexedOn,
                        Title = r.documents.Title,
                        Description = r.documents.Description,
                        Host = new Uri(r.urls.ResolvedPath).Host,
                        Length = r.documents.Length,
                        Rate = 0CalculateRating(breakedQuery, r.urls.ResolvedPath, r.documents.Title, r.documents.Keywords, r.documents.Description, r.words)
                     }).AsEnumerable()
                     .OrderByDescending(result => result.Rate)
                     .Distinct(new SearchResultEqualityComparer());

和速率由该方法计算:

private int CalculateRating(IEnumerable<string> breakedQuery, string resolvedPath, string title, string keywords, string description, IEnumerable<string> words)
    {
        var baseRate = 0;

        foreach (var query in breakedQuery)
        {
            /*first I'm breaking up user raw query (Microsoft -Apple) to list of broken
            queries (Microsoft, -Apple) if broken query start with - that means
            results shouldn't have*/
            var none = (query.StartsWith("-"));
            string term = query.Replace("-", "");

            var pathCount = Calculate(resolvedPath, term);
            var titleCount = Calculate(title, term);
            var keywordsCount = Calculate(keywords, term);
            var descriptionCount = Calculate(description, term);
            var wordsCount = Calculate(words, term);

            var result = (pathCount * 100) + (titleCount * 50) + (keywordsCount * 25) + (descriptionCount * 10) + (wordsCount);

            if (none)
                baseRate -= result;
            else
                baseRate += result;
        }
        return baseRate;
    }

    private int Calculate(string source, string query)
    {
        if (!string.IsNullOrWhiteSpace(source))
            return Calculate(source.Split(' ').AsEnumerable<string>(), query);
        return 0;
    }

    private int Calculate(IEnumerable<string> sources, string query)
    {
        var count = 0;
        if (sources != null && sources.Count() > 0)
        {
            //to comparing two strings
            //first case sensitive
            var elements = sources.Where(source => source == query);
            count += elements.Count();
            //second case insensitive (half point of sensitive)
            count += sources.Except(elements).Where(source => source.ToLowerInvariant() == query.ToLowerInvariant()).Count() / 2;
        }
        return count;
    }

请指引我以提高性能(速度我的搜索引擎是非常非常低)

Please guide me to improve performance (speed of my search engine is very very low)

推荐答案

我希望这是到你的从_context.Urls网址 - 没有如果这个你'重新获得了大量数据来,然后扔掉建立你的结果时。有多少项目在tmpQuery /结果?

I expect this is down to your from urls in _context.Urls - with no Where on this you're getting a lot of data to then throw away when building up your results. How many items are in tmpQuery / results?

这篇关于简单等级算法来排序结果根据用户查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆