使用cPickle序列化大型词典会导致MemoryError [英] Using cPickle to serialize a large dictionary causes MemoryError

查看：298 发布时间：2020/5/27 20:22:04 python serialization pickle inverted-index

本文介绍了使用cPickle序列化大型词典会导致MemoryError的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在为文档集合上的搜索引擎编写反向索引.现在，我将索引存储为字典字典.也就是说，每个关键字都映射到docIDs->出现位置的字典.

I'm writing an inverted index for a search engine on a collection of documents. Right now, I'm storing the index as a dictionary of dictionaries. That is, each keyword maps to a dictionary of docIDs->positions of occurrence.

数据模型如下所示: {word:{doc_name:[location_list]}}

The data model looks something like: {word : { doc_name : [location_list] } }

在内存中建立索引工作正常，但是当我尝试序列化到磁盘时，遇到了MemoryError.这是我的代码:

Building the index in memory works fine, but when I try to serialize to disk, I hit a MemoryError. Here's my code:

# Write the index out to disk
serializedIndex = open(sys.argv[3], 'wb')
cPickle.dump(index, serializedIndex, cPickle.HIGHEST_PROTOCOL)

在序列化之前，我的程序正在使用大约50％的内存(1.6 Gb).一旦我打电话给cPickle，我的内存使用率便猛增到80％，然后崩溃了.

Right before serialization, my program is using about 50% memory (1.6 Gb). As soon as I make the call to cPickle, my memory usage skyrockets to 80% before crashing.

为什么cPickle使用那么多的内存进行序列化?有没有更好的方法来解决这个问题?

Why is cPickle using so much memory for serialization? Is there a better way to be approaching this problem?

使用cPickle序列化大型词典会导致MemoryError [英] Using cPickle to serialize a large dictionary causes MemoryError

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用cPickle序列化大型词典会导致MemoryError [英] Using cPickle to serialize a large dictionary causes MemoryError

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭