将一列json字符串转换为数据列 [英] Convert a column of json strings into columns of data

查看:262
本文介绍了将一列json字符串转换为数据列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约30000行的大数据框和一个包含json字符串的单列.每个json字符串都包含许多变量及其值,我想将此json字符串分解为数据列

I have a big dataframe of around 30000 rows and a single column containing a json string. Each json string contains a number of variables and its value I want to break this json string down into columns of data

两行看起来像

0 {"a":"1","b":"2","c":"3"}
1 {"a" ;"4","b":"5","c":"6"}

我想将其转换为类似的数据框

I want to convert this into a dataframe like

a   b   c
1   2   3
4   5   6

请帮助

推荐答案

您的列值似乎在实际json字符串之前有一个额外的数字.因此,您可能希望先将其删除(如果不是这种情况,请跳至 Method )

Your column values seem to have an extra number before the actual json string. So you might want strip that out first (skip to Method if that isn't the case)

一种方法是将函数应用于列

One way to do that is to apply a function to the column

# constructing the df
df = pd.DataFrame([['0 {"a":"1","b":"2","c":"3"}'],['1 {"a" :"4","b":"5","c":"6"}']], columns=['json'])

# print(df)
                         json
# 0  0 {"a":"1","b":"2","c":"3"}
# 1  1 {"a" :"4","b":"5","c":"6"}

# function to remove the number
import re

def split_num(val):
    p = re.compile("({.*)")
    return p.search(val).group(1)

# applying the function
df['json'] = df['json'].map(lambda x: split_num(x))
print(df)

#                          json
# 0   {"a":"1","b":"2","c":"3"}
# 1  {"a" :"4","b":"5","c":"6"}


方法:

一旦df采用上述格式,则下面将把每个行条目转换为字典:

Once the df is in the above format, the below will convert each row entry to a dictionary:

df['json'] = df['json'].map(lambda x: dict(eval(x)))

然后,将pd.Series应用于列即可完成工作

Then, applying pd.Series to the column will do the job

d = df['json'].apply(pd.Series)
print(d)
#   a  b  c
# 0  1  2  3
# 1  4  5  6

这篇关于将一列json字符串转换为数据列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆