快速将JSON列转换为Pandas数据框 [英] Fast convert JSON column into Pandas dataframe

查看:136
本文介绍了快速将JSON列转换为Pandas数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从一列存储为JSON的数据库(超过5万行)中读取数据.我想将其提取到pandas数据框中. 下面的代码片段可以正常工作,但是效率很低,并且在对整个数据库运行时会花费很多时间. 请注意,并非所有项目都具有相同的属性,并且JSON具有一些嵌套的属性.

I'm reading data from a database (50k+ rows) where one column is stored as JSON. I want to extract that into a pandas dataframe. The snippet below works fine but is fairly inefficient and really takes forever when run against the whole db. Note that not all the items have the same attributes and that the JSON have some nested attributes.

我怎样才能更快?

import pandas as pd
import json

df = pd.read_csv('http://pastebin.com/raw/7L86m9R2', \
                 header=None, index_col=0, names=['data'])

df.data.apply(json.loads) \
       .apply(pd.io.json.json_normalize)\
       .pipe(lambda x: pd.concat(x.values))
###this returns a dataframe where each JSON key is a column

推荐答案

json_normalize takes an already processed json string or a pandas series of such strings.

pd.io.json.json_normalize(df.data.apply(json.loads))


设置


setup

import pandas as pd
import json

df = pd.read_csv('http://pastebin.com/raw/7L86m9R2', \
                 header=None, index_col=0, names=['data'])

这篇关于快速将JSON列转换为Pandas数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆