Pandas vs JSON库以Python读取JSON文件 [英] Pandas vs JSON library to read a JSON file in Python
问题描述
似乎我可以同时使用熊猫和/或json来读取json文件,即
It seems that I can use both pandas and/or json to read a json file, i.e.
import pandas as pd
pd_example = pd.read_json('some_json_file.json')
或等效地
import json
json_example = json.load(open('some_json_file.json'))
所以我的问题是,有什么区别,我应该使用哪一个?是否建议一种方法优于另一种方法?在某些情况下,其中一种方法比另一种更好吗?谢谢.
So my question is, what's the difference and which one should I use? Is one way recommended over another, are there certain situations where one is better than the other, etc. ? Thanks.
推荐答案
当JSON文件中只有一个JSON结构时,请使用read_json
,因为它将JSON直接加载到DataFrame中.使用json.loads
,您必须将其加载到python字典/列表中,然后将 then 加载到DataFrame中-这是一个不必要的两步过程.
When you have a single JSON structure inside a json file, use read_json
because it loads the JSON directly into a DataFrame. With json.loads
, you've to load it into a python dictionary/list, and then into a DataFrame - an unnecessary two step process.
当然,这是基于结构可以直接解析为DataFrame的假设.对于非平凡的结构(通常为复杂的嵌套字典列表形式),您可能需要使用json_normalize
代替.
Of course, this is under the assumption that the structure is directly parsable into a DataFrame. For non-trivial structures (usually of the form of complex nested lists-of-dicts), you may want to use json_normalize
instead.
另一方面,有了JSON lines 文件,故事就变得不一样了.根据我的经验,我发现用pd.read_json(..., lines=True)
加载JSON行文件实际上在大数据上稍微慢一些(在大约50k +记录中测试一次),并且更糟糕的是,无法处理带有错误-整个读取操作失败.相比之下,您可以在try-except大括号内的文件的每一行上使用json.loads
,以获得一些可靠的代码,而这些代码实际上最终会以更快的速度单击几下.去搞清楚.
On the other hand, with a JSON lines file, the story becomes different. From my experience, I've found loading a JSON lines file with pd.read_json(..., lines=True)
is actually slightly slower on large data (tested on ~50k+ records once), and to make matters worse, cannot handle rows with errors - the entire read operation fails. In contrast, you can use json.loads
on each line of your file inside a try-except brace for some robust code which actually ends up being a few clicks faster. Go figure.
使用适合情况的任何东西.
Use whatever fits the situation.
这篇关于Pandas vs JSON库以Python读取JSON文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!