使用Python将大型字典存储到文件 [英] Store large dictionary to file in Python

查看：82 发布时间：2021/4/30 19:55:15 python dictionary storage store pickle

本文介绍了使用Python将大型字典存储到文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一本字典，其中包含许多条目和一个巨大的向量作为值.这些向量的尺寸可能是60.000，而且字典中有大约60.000条目.为了节省时间，我想在计算后将其存储.但是，使用泡菜会导致文件很大.我曾尝试存储到JSON，但是文件仍然很大(例如，在尺寸较小的50个条目的样本上为10.5 MB).我也读过关于稀疏矩阵的信息.由于大多数条目将为0，因此这是有可能的.这会减少文件大小吗?还有其他方法可以存储此信息吗?还是我只是倒霉?

I have a dictionary with many entries and a huge vector as values. These vectors can be 60.000 dimensions large and I have about 60.000 entries in the dictionary. To save time, I want to store this after computation. However, using a pickle led to a huge file. I have tried storing to JSON, but the file remains extremely large (like 10.5 MB on a sample of 50 entries with less dimensions). I have also read about sparse matrices. As most entries will be 0, this is a possibility. Will this reduce the filesize? Is there any other way to store this information? Or am I just unlucky?

更新:

谢谢大家的答复.我想存储这些数据，因为这些是字数统计.例如，当给定句子时，我存储单词0(在数组中的位置0)出现在句子中的次数.显然，所有句子中的单词多于一个句子中出现的单词，因此有很多零.然后，我想使用此数组来训练至少三个(也许六个)分类器.创建带有单词计数的数组，然后在夜间运行分类器以进行训练和测试似乎更容易.我为此使用sklearn.选择此格式是为了与其他特征向量格式保持一致，这就是为什么我要采用这种方式来解决这个问题.如果这不是要走的路，在这种情况下，请告诉我.我非常了解在有效编码方面有很多东西要学习！

Thank you all for the replies. I want to store this data as these are word counts. For example, when given sentences, I store the amount of times word 0 (at location 0 in the array) appears in the sentence. There are obviously more words in all sentences than appear in one sentence, hence the many zeros. Then, I want to use this array tot train at least three, maybe six classifiers. It seemed easier to create the arrays with word counts and then run the classifiers over night to train and test. I use sklearn for this. This format was chosen to be consistent with other feature vector formats, which is why I am approaching the problem this way. If this is not the way to go, in this case, please let me know. I am very much aware that I have much to learn in coding efficiently!

我也开始实现稀疏矩阵.该文件现在更大了(使用300个句子的样本集进行测试).

I also started implementing sparse matrices. The file is even bigger now (testing with a sample set of 300 sentences).

更新2:谢谢大家的提示.John Mee不需要存储数据是正确的.他和Mike McKerns都告诉我使用稀疏矩阵，这大大加快了计算速度！因此，谢谢您的投入.现在，我的武器库中有了一个新工具！

Update 2: Thank you all for the tips. John Mee was right by not needing to store the data. Both he and Mike McKerns told me to use sparse matrices, which sped up calculation significantly! So thank you for your input. Now I have a new tool in my arsenal!

使用Python将大型字典存储到文件 [英] Store large dictionary to file in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Python将大型字典存储到文件 [英] Store large dictionary to file in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭