在一个倒置的指数计算字接近 [英] Calculating Word Proximity in an inverted Index

查看：185 发布时间：2015/11/30 22:29:26 algorithm indexing search-engine information-retrieval inverted-index

本文介绍了在一个倒置的指数计算字接近的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

随着搜索引擎的一部分，我已经开发了一个倒排索引。

As part of search engine i have developed an inverted index.

所以我必须包含以下类型的元素列表

So i have a list which contains elements of the following type

public struct ForwardBarrelRecord
{
    public string DocId;
    public int hits { get; set; }
    public List<int> hitLocation;
}

现在这个纪录是针对单个字。该hitLocation，其中包含一个特定的词已经在文档中的位置。

Now this record is against a single word. The hitLocation contains the locations where a particular word has been found in a document.

现在我想是计算在元素的亲密名单，其中，INT＆GT; hitLocation 另一个名单，其中，INT＆GT; hitLocation ，然后如果在列表中的元素是相邻的再增加两个记录的重量。

Now what i want is to calculate the closeness of elements in List<int> hitLocation to another List<int> hitLocation and then if the elements in the List are adjacent then to increase the weight of both records.

的问题，我有是找到一个合适的算法用于此目的。任何帮助AP preciated

Problem that i am having is finding a suitable algorithm for this purpose. Any Help is appreciated

推荐答案

这是最简单的，如果 hitLocation 列表是按照排序顺序。因此，开始有：

This is easiest if the hitLocation lists are in sorted order. So start with:

var word1List = word1.hitLocation.Orderby(s => s).ToList();
var word2List = word2.hitLocation.Orderby(s => s).ToList();

但如果你这样做是一个搜索引擎，那么你可能会想这些清单是pre-排序的倒排索引。

Although if you're doing this for a search engine then you'll probably want those lists to be pre-sorted in your inverted index.

在任何情况下，一旦你的列表进行排序，找到匹配的是pretty的方便。

In any case, once you have the lists sorted, finding matches is pretty easy.

int ix1 = 0;
int ix2 = 0;
while (ix1 < word1List.Count && ix2 < word2List.Count)
{
    int hit1 = word1List[ix1];
    int hit2 = word2List[ix2];
    if (hit1 < hit2)
    {
        if ((hit2 - hit1) == 1)
        {
            Console.WriteLine("Match at {0} and {1}", hit1, hit2);
        }
        ix1++;
    }
    else
    {
        ix2++;
    }
}

这将定位出现字词1之后WORD2的。如果你也想WORD2跟着字1，你可以把其他子句中类似的检查。

That will locate occurrences of word1 followed by word2. If you also want word2 followed by word1, you could put a similar check in the else clause.

这篇关于在一个倒置的指数计算字接近的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在一个倒置的指数计算字接近 [英] Calculating Word Proximity in an inverted Index

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

在一个倒置的指数计算字接近 [英] Calculating Word Proximity in an inverted Index

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭