在python中集成多个字典(大数据) [英] Integrating multiple dictionaries in python (big data)

查看：61 发布时间：2020/5/8 19:54:08 python memory data-mining

本文介绍了在python中集成多个字典(大数据)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在从事大数据挖掘的研究项目.我已经写了当前将组织的数据整理成字典的代码.但是，数据量如此之大，以至于在形成字典时，我的计算机内存不足.我需要定期将字典写入主存储器并以这种方式创建多个字典.然后，我需要比较生成的多个词典，相应地更新键和值，并将整个内容存储在磁盘上的一个大词典中.知道我如何在python中做到这一点吗?我需要一个可以将字典快速写入磁盘然后比较2个字典和更新密钥的api.实际上，我可以编写比较两个命令的代码，这不是问题，但是我需要做到这一点而不会耗尽内存.

I am working on a research project in big data mining. I have written the code currently to organize the data I have into a dictionary. However, The amount of data is so huge that while forming the dictionary, my computer runs out of memory. I need to periodically write my dictionary to main memory and create multiple dictionaries this way. I then need to compare the resulting multiple dictionaries, update the keys and values accordingly and store the whole thing in one big dictionary on disk. Any idea how I can do this in python? I need an api that can quickly write a dict to disk and then compare 2 dicts and update keys. I can actually write the code to compare 2 dicts, that's not a problem but I need to do it without running out of memory..

我的字典看起来像这样: 橙色":[是水果"，非常好吃"，...]

My dict looks like this: "orange" : ["It is a fruit","It is very tasty",...]

推荐答案

与霍夫曼(Hoffman)达成共识:使用关系数据库.数据处理对于关系引擎来说是一项不寻常的任务，但是相信，这是在易于使用/部署和大型数据集的速度之间的良好折衷.

Agree with Hoffman: go for a relational database. Data-processing is a bit of an unusual task for a relational engine, but believe, it is a good compromise between easy of use/deployment and speed for large datasets.

我通常使用Python附带的sqlite3，尽管我更经常通过 apsw .诸如sqlite3之类的关系引擎的优势在于，您可以指示它通过联接和更新来对数据进行大量处理，并且它将以一种非常明智的方式处理所需数据的所有内存/磁盘交换.您还可以使用内存数据库来保存需要与大数据进行交互的小数据，并通过"ATTACH"语句将它们链接起来.我已经以这种方式处理了千兆字节.

I customarily use sqlite3, that comes just with Python, although more often I use it through apsw. The advantage of a relational engine like sqlite3 is that you can instruct it to do a lot of processing with your data through joins and updates, and it will take care of all the memory/disk swapping of data required, in quite a sensible manner. You can also use in-memory databases to hold small data which you need interacting with your big data, and have them linked through "ATTACH" statements. I have processed gigabytes this way.

这篇关于在python中集成多个字典(大数据)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在python中集成多个字典(大数据) [英] Integrating multiple dictionaries in python (big data)

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

在python中集成多个字典(大数据) [英] Integrating multiple dictionaries in python (big data)

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭