Pandas vs JSON库以Python读取JSON文件 [英] Pandas vs JSON library to read a JSON file in Python

查看:230
本文介绍了Pandas vs JSON库以Python读取JSON文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎我可以同时使用熊猫和/或json来读取json文件,即

It seems that I can use both pandas and/or json to read a json file, i.e.

import pandas as pd
pd_example = pd.read_json('some_json_file.json')

或等效地

import json
json_example = json.load(open('some_json_file.json'))

所以我的问题是,有什么区别,我应该使用哪一个?是否建议一种方法优于另一种方法?在某些情况下,其中一种方法比另一种更好吗?谢谢.

So my question is, what's the difference and which one should I use? Is one way recommended over another, are there certain situations where one is better than the other, etc. ? Thanks.

推荐答案

当JSON文件中只有一个JSON结构时,请使用read_json,因为它将JSON直接加载到DataFrame中.使用json.loads,您必须将其加载到python字典/列表中,然后将 then 加载到DataFrame中-这是一个不必要的两步过程.

When you have a single JSON structure inside a json file, use read_json because it loads the JSON directly into a DataFrame. With json.loads, you've to load it into a python dictionary/list, and then into a DataFrame - an unnecessary two step process.

当然,这是基于结构可以直接解析为DataFrame的假设.对于非平凡的结构(通常为复杂的嵌套字典列表形式),您可能需要使用json_normalize代替.

Of course, this is under the assumption that the structure is directly parsable into a DataFrame. For non-trivial structures (usually of the form of complex nested lists-of-dicts), you may want to use json_normalize instead.

另一方面,有了JSON lines 文件,故事就变得不一样了.根据我的经验,我发现用pd.read_json(..., lines=True)加载JSON行文件实际上在大数据上稍微慢一些(在大约50k +记录中测试一次),并且更糟糕的是,无法处理带有错误-整个读取操作失败.相比之下,您可以在try-except大括号内的文件的每一行上使用json.loads,以获得一些可靠的代码,而这些代码实际上最终会以更快的速度单击几下.去搞清楚.

On the other hand, with a JSON lines file, the story becomes different. From my experience, I've found loading a JSON lines file with pd.read_json(..., lines=True) is actually slightly slower on large data (tested on ~50k+ records once), and to make matters worse, cannot handle rows with errors - the entire read operation fails. In contrast, you can use json.loads on each line of your file inside a try-except brace for some robust code which actually ends up being a few clicks faster. Go figure.

使用适合情况的任何东西.

Use whatever fits the situation.

这篇关于Pandas vs JSON库以Python读取JSON文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆