如何将多个json文件读入pandas数据框? [英] How to read multiple json files into pandas dataframe?

查看:69
本文介绍了如何将多个json文件读入pandas数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很难将多个以行分隔的JSON文件加载到一个熊猫数据框中。这是我正在使用的代码:

I'm having a hard time loading multiple line delimited JSON files into a single pandas dataframe. This is the code I'm using:

import os, json
import pandas as pd
import numpy as np
import glob
pd.set_option('display.max_columns', None)

temp = pd.DataFrame()

path_to_json = '/Users/XXX/Desktop/Facebook Data/*' 

json_pattern = os.path.join(path_to_json,'*.json')
file_list = glob.glob(json_pattern)

for file in file_list:
    data = pd.read_json(file, lines=True)
    temp.append(data, ignore_index = True)

当我查看 file_list 时,似乎所有文件都已加载,但不能弄清楚如何将每个文件放入数据框。大约有50个文件,每个文件中都有几行。

It looks like all the files are loading when I look through file_list, but cannot figure out how to get each file into a dataframe. There are about 50 files with a couple lines in each file.

推荐答案

将最后一行更改为:

temp = temp.append(data, ignore_index = True)

我们之所以这样做,是因为追加未正确执行。 append方法不会修改数据框。它只是返回带有追加操作结果的新数据帧。

The reason we have to do this is because the append doesn't happen in place. The append method does not modify the data frame. It just returns a new data frame with the result of the append operation.

自编写此答案以来,我了解到您永远不要在循环内使用 DataFrame.append ,因为它会导致二次复制(请参见此答案)。

Since writing this answer I have learned that you should never use DataFrame.append inside a loop because it leads to quadratic copying (see this answer).

您应该首先创建一个数据帧列表,然后使用 pd.concat 在一次操作中将它们全部串联在一起。像这样:

What you should do instead is first create a list of data frames and then use pd.concat to concatenate them all in a single operation. Like this:

dfs = [] # an empty list to store the data frames
for file in file_list:
    data = pd.read_json(file, lines=True) # read data frame from json file
    dfs.append(data) # append the data frame to the list

temp = pd.concat(dfs, ignore_index=True) # concatenate all the data frames in the list.

此替代方法应该快得多。

This alternative should be considerably faster.

这篇关于如何将多个json文件读入pandas数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆