存储倒排索引 [英] Storing an inverted index

查看：90 发布时间：2020/6/26 19:19:55 python information-retrieval inverted-index

本文介绍了存储倒排索引的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在从事有关信息检索的项目. 我已经使用Hadoop/Python建立了完全反向索引. Hadoop将索引输出为(单词，文档列表)对，并将其写在文件上. 为了快速访问，我使用上述文件创建了一个字典(哈希表). 我的问题是，如何在具有快速访问时间的磁盘上存储这样的索引. 目前，我正在使用python pickle模块存储字典并从中加载但是它会将整个索引立即带入内存(或者是吗?). 请提出一种有效的索引存储和搜索方法.

I am working on a project on Info Retrieval. I have made a Full Inverted Index using Hadoop/Python. Hadoop outputs the index as (word,documentlist) pairs which are written on the file. For a quick access, I have created a dictionary(hashtable) using the above file. My question is, how do I store such an index on disk that also has quick access time. At present I am storing the dictionary using python pickle module and loading from it but it brings the whole of index into memory at once (or does it?). Please suggest an efficient way of storing and searching through the index.

我的字典结构如下(使用嵌套字典)

My dictionary structure is as follows (using nested dictionaries)

{word:{doc1:[位置]，doc2:[位置]，....}}

{word : {doc1:[locations], doc2:[locations], ....}}

这样我就可以得到包含一个单词的文档 dictionary [word] .keys()...等等.

so that I can get the documents containing a word by dictionary[word].keys() ... and so on.

存储倒排索引 [英] Storing an inverted index

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

存储倒排索引 [英] Storing an inverted index

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭