在Python中解析/处理巨大的JSON文件的有效方法是什么? [英] What are the efficient ways to parse / process huge JSON files in Python?

查看:48
本文介绍了在Python中解析/处理巨大的JSON文件的有效方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于我的项目,我必须解析两个大的JSON文件,一个是19.7 GB,另一个是66.3 GB.JSON数据的结构太复杂.一级词典,第二级中又可能有列表或词典.这些都是网络日志文件,我必须解析这些日志文件并进行分析.建议将这么大的JSON文件转换为CSV吗?

For my project I have to parse two big JSON files, one is 19.7 GB and another 66.3 GB. The structure of the JSON data is too complex. First Level Dictionary and again in 2nd level there might be List or Dictionary. These are all Network Log files, I have to parse those log files and do analysis. Is converting such big JSON file to CSV is advisable?

当我尝试将较小的19.7 GB JSON文件转换为CSV文件时,它具有大约2000列和50万行.我正在使用Pandas解析这些数据.我还没有碰到更大的文件66.3 GB.我是否朝着正确的方向前进?当我转换那个更大的文件时,将出现多少列和行,这是毫无头绪的.

When I am trying to convert the smaller 19.7 GB JSON file to CSV file, it is having around 2000 columns and 0.5 millions of rows. I am using Pandas to parse those data. I have not touched the bigger file 66.3 GB. Whether I am going in right direction or not? When I 'll convert that bigger file, how many columns and rows will come out, there is no idea.

请提出任何其他好的选择(如果存在).还是建议直接从JSON文件读取并在其上应用OOP概念.

Kindly suggest any other good options if exists. Or is it advisable to directly read from JSON file and apply OOPs concept over it.

我已经阅读了以下文章:文章1来自Stack Overflow Quora的文章2

I have already read these articles: article 1 from Stack Overflow and article 2 from Quora

推荐答案

您可能想使用 dask它的语法与pandas类似,只有其并行(本质上是许多并行的pandas数据帧)和lazy(这有助于避免ram限制).

you might want to use dask its has similar syntax to pandas only its parallel (essentially its lots of parallel pandas datafames) and lazy (this helps with avoiding ram limitations).

您可以使用 read_json 方法,然后在 dataframe 上进行计算.

you could use the read_json method and then do your calculations on the dataframe.

这篇关于在Python中解析/处理巨大的JSON文件的有效方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆