将pandas DataFrame中的列添加到特定对象级别的深度嵌套的JSON中 [英] Add column from pandas DataFrame into deeply nested JSON at a specific object level
本文介绍了将pandas DataFrame中的列添加到特定对象级别的深度嵌套的JSON中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
假设我有一个DataFrame df
,例如:
Assume I have a DataFrame df
like:
source tables columns data_type length RecordCount
src1 table1 col1 INT 4 71
src1 table1 col2 CHAR 2 71
src1 table2 col1 CHAR 2 43
src2 table1 col1 INT 4 21
src2 table1 col2 DATE 3 21
需要类似以下内容的输出:
Need an output that looks similar to:
{
"src1": {
"table1": {
"Record Count": 71 #missing in my current code output
"col1": {
"type": "INT"
"length": 4
},
"col2": {
"type": "CHAR"
"length": 2
}
},
"table2": {
"Record Count": 43 #missing in my current code output
"col1": {
"type": "CHAR"
"length": 2
}
}
},
"src2": {
"table1": {
"Record Count": 21 #missing in my current code output
"col1": {
"type": "INT"
"length": 4
},
"col2": {
"type": "DATE"
"length": 3
}
}
}
}
当前代码:
def make_nested(df):
f = lambda: defaultdict(f)
data = f()
for row in df.to_numpy().tolist():
t = data
for index, r in enumerate(row[:-4]):
t = t[r]
if index == 1:
t[row[-5]]: {
"Record Count": row[-1]
}
t[row[-4]] = {
"type": row[-3],
"length": row[-2]
}
return data
推荐答案
这是另一种使用两步groupby方法的解决方案.
Here is another solution use two steps of groupby method.
# First, groupby ['source','tables'] to deal with the annoying 'Record Count'
# Need python 3.5+
# Otherwise, another method to merge two dicts should be used
df_new=df.groupby(['source','tables']).apply(lambda x: {**{'Record Count':x.iloc[0,-1]}, **{x.iloc[i,-4]: {'type':x.iloc[i,-3],'length':x.iloc[i,-2]} for i in range(len(x))}}).reset_index()
source tables 0
0 src1 table1 {'Record Count': 71, 'col1': {'type': 'INT', 'length': 4}, 'col2': {'type': 'CHAR', 'length': 2}}
1 src1 table2 {'Record Count': 43, 'col1': {'type': 'CHAR', 'length': 2}}
2 src2 table1 {'Record Count': 21, 'col1': {'type': 'INT', 'length': 4}, 'col2': {'type': 'DATE', 'length': 3}}
# Second groupby
df_final = df_new.groupby('source').apply(lambda x: {x.iloc[i,-2]: x.iloc[i,-1] for i in range(len(x))})
output = df_final.to_json()
output
是json文件的编码字符串类型.获取缩进版本
The output
is an encoded string type of json file. To get the indented version
import json
temp = json.loads(output)
with open('somefile','w') as f:
json.dump(temp,f,indent=4)
这篇关于将pandas DataFrame中的列添加到特定对象级别的深度嵌套的JSON中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文