将两个 pandas 数据帧组合成一个数据帧"dict type cell". (pd.Panel已弃用) [英] combine two pandas dataframe into one dataframe "dict type cell" (pd.Panel deprecated)
问题描述
我正在尝试将多个pandas.DataFrame合并为一个集合,将其保存在mongodb中,所有数据帧都具有相同的索引/列,我想使用to_json()方法将其保存在一个文档中.将数据帧的所有单元格作为字典,这可能是一个很好的方法.为此,我想像这样连接数据框:
I'm trying to concat multiples pandas.DataFrame to be saved in a mongodb in just one collection, all the dataframes have the same index/columns and I wanted to save it, in just one document, using to_json() method. Having all the cells of the dataframe as dicts, its probably a good approach. To accomplish that I wanted to concat the dataframes like this:
df1:
index A B
1 'A1' 'B1'
2 'A2' 'B2'
3 'A3' 'B3'
df2:
index A B
1 'a1' 'b1'
2 'a2' 'b2'
3 'a3' 'b3'
期望的解决方案:
df_sol:
index A B
1 {d1:'A1', d2:'a1'} {d1:'B1', d2:'b1'}
2 {d1:'A2', d2:'a2'} {d1:'B2', d2:'b2'}
3 {d1:'A3', d2:'a3'} {d1:'B3', d2:'b3'}
即时通讯使用的方法是
pd.Panel(dict(d1=df1, d2=df2)).apply(pd.Series.to_dict, 0)
A B
index
1 {'d1': 'A1', 'd2': 'a1'} {'d1': 'B1', 'd2': 'b1'}
2 {'d1': 'A2', 'd2': 'a2'} {'d1': 'B2', 'd2': 'b2'}
3 {'d1': 'A3', 'd2': 'a3'} {'d1': 'B3', 'd2': 'b3'}
,但是pd.Panel
已弃用的DeprecationWarning : Panel is deprecated and will be removed in a future version.
,是否有仅使用pandas
的解决方法?
谢谢!
but pd.Panel
its deprecated DeprecationWarning : Panel is deprecated and will be removed in a future version.
its there a workaround using just pandas
?
thanks!
推荐答案
解决方案
pd.concat
+其他内容
Solution
pd.concat
+ other stuff
pd.Series(
pd.concat([df1, df2], axis=1, keys=['d1', 'd2']).stack().to_dict('index')
).unstack()
A B
1 {'d1': ''A1'', 'd2': ''a1''} {'d1': ''B1'', 'd2': ''b1''}
2 {'d1': ''A2'', 'd2': ''a2''} {'d1': ''B2'', 'd2': ''b2''}
3 {'d1': ''A3'', 'd2': ''a3''} {'d1': ''B3'', 'd2': ''b3''}
说明
我想将[1, 2, 3]
和['A', 'B']
放入索引,并将['d1', 'd2']
作为列.
Explanation
I want to get [1, 2, 3]
and ['A', 'B']
into the index and ['d1', 'd2']
as the columns.
我从pd.concat
pd.concat([df1, df2], axis=1, keys=['d1', 'd2'])
d1 d2
A B A B
index
1 'A1' 'B1' 'a1' 'b1'
2 'A2' 'B2' 'a2' 'b2'
3 'A3' 'B3' 'a3' 'b3'
几乎可以带我到那里.如果在stack
后面加上它,它将把列的最后一层放到索引的最后一层:
Which almost gets me there. If I follow that with a stack
, it will drop the last level of the columns into the last level of the index:
pd.concat([df1, df2], axis=1, keys=['d1', 'd2']).stack()
d1 d2
index
1 A 'A1' 'a1'
B 'B1' 'b1'
2 A 'A2' 'a2'
B 'B2' 'b2'
3 A 'A3' 'a3'
B 'B3' 'b3'
这就是我想要的.从这里我可以使用.to_dict('index')
And this is what I want. From here I can use .to_dict('index')
pd.concat([df1, df2], axis=1, keys=['d1', 'd2']).stack().to_dict('index')
{(1, 'A'): {'d1': "'A1'", 'd2': "'a1'"},
(1, 'B'): {'d1': "'B1'", 'd2': "'b1'"},
(2, 'A'): {'d1': "'A2'", 'd2': "'a2'"},
(2, 'B'): {'d1': "'B2'", 'd2': "'b2'"},
(3, 'A'): {'d1': "'A3'", 'd2': "'a3'"},
(3, 'B'): {'d1': "'B3'", 'd2': "'b3'"}}
并将其传递回pd.Series
构造函数以获取一系列字典.
And pass that back to the pd.Series
constructor to get a series of dictionaries.
pd.Series(
pd.concat([df1, df2], axis=1, keys=['d1', 'd2']).stack().to_dict('index')
)
1 A {'d1': ''A1'', 'd2': ''a1''}
B {'d1': ''B1'', 'd2': ''b1''}
2 A {'d1': ''A2'', 'd2': ''a2''}
B {'d1': ''B2'', 'd2': ''b2''}
3 A {'d1': ''A3'', 'd2': ''a3''}
B {'d1': ''B3'', 'd2': ''b3''}
dtype: object
剩下要做的就是unstack
,我在上面的解决方案中显示了它.
The only thing left to do is unstack
which I show in the solution above.
这篇关于将两个 pandas 数据帧组合成一个数据帧"dict type cell". (pd.Panel已弃用)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!