将大 pandas 的数据帧转换为数据帧的数据 [英] Convert pandas dataframe of lists to dict of dataframes

查看:118
本文介绍了将大 pandas 的数据帧转换为数据帧的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框(带有DateTime索引),其中一些列包含列表,每个列都有6个元素。

  In:dframe.head()
出:
AB \
时间戳
2017-05-01 00:32:25 30 [-3512,375,10-10 ,-358,-1296,-4019]
2017-05-01 00:32:55 30 [-3519,372,-1026,-361,-1302,-4020]
2017-05 -01 00:33:25 30 [-3514,371,-1026,-360,-1297,-4018]
2017-05-01 00:33:55 30 [-3517,377,-1030, -363,-1293,-4027]
2017-05-01 00:34:25 30 [-3515,372,-1033,-361,-1299,-4025]
CD
时间戳
2017-05-01 00:32:25 [1104,1643,625,1374,5414,2066] 49.93
2017-05-01 00:32:55 [1106,1643,622 ,1385 ,5441,2074] 49.94
2017-05-01 00:33:25 [1105,1643,623,1373,5445,2074] 49.91
2017-05-01 00:33:55 [1105 ,1646,620,1384,5438,2076] 49.91
2017-05-01 00:34:25 [1104,1645,613,1374,5431,2082] 49.94

我有一个字典 dict_of_dfs 我想用6个数据框填充

  dict_of_dfs = {1:df1,2:df2,3:df3,4:df4,5:df5,6:df6} 

其中 ith 数据框包含每个 列表,所以dict中的第一个数据框将是:

  In:df1 
Out:
ABCD
时间戳
2017-05-01 00:32:25 30 -3512 1104 49.93
2017-05-01 00:32:55 30 -3519 1106 49.94
2017-05 -01 00:33:25 30 -3514 1105 49.91
2017-05-01 00:33:55 30 -3517 1105 49.91
2017-05-01 00:34:25 30 -3515 1104 49.94
/ pre>

等等。
实际的数据框具有比这更多的列和数千行。
什么是最简单,最Python的方式进行转换?

解决方案

您可以使用dict理解与 assign 列表的选择值使用 str [0] str [1 ]

  N = 6 
dfs = {i:df.assign(B = df ['B']对于范围(1,N + 1)中的i,str [i-1],C = df ['C']。str [i-1])$ ​​b
$ b打印(dfs [1])$ ​​b $ b时间戳ABCD
0 2017-05-01 00:32:25 30 -3512 1104 49.93
1 2017-05-01 00:32:55 30 -3519 1106 49.94
2 2017-05-01 00:33:25 30 -3514 1105 49.91
3 2017-05-01 00:33:55 30 -3517 1105 49.91
4 2017 -05-01 00:34:25 30 -3515 1104 49.94

另一个解决方案:

  dfs = {i:df.apply(lambda x:x.str [i-1] if type(x.iat [0]) == list else x )for i in range(1,7)} 

print(dfs [1])$ ​​b $ b时间戳ABCD
0 2017-05-01 00:32:25 30 -3512 1104 49.93
1 2017-05-01 00:32:55 30 -3519 1106 49.94
2 2017-05-01 00:33:25 30 -3514 1105 49.91
3 2017-05 -01 00:33:55 30 -3517 1105 49.91
4 2017-05-01 00:34:25 30 -3515 1104 49.94

计时

  df = pd.concat [df] * 10000).reset_index(drop = True)

在[185]中:%timeit {i:df.assign(B = df ['B']。str [i-1] ,C = df ['C']。str [i-1])for i in range(1,N + 1)}
1循环,最好3:420 ms每循环

在[186]中:%timeit {i:df.apply(lambda x:x.str [i-1] if type(x.iat [0])== list else x)for i in range(1, 7)}
1循环,最好3:447 ms每循环

在[187]:%timeit {(i + 1):df.applymap(lambda x:x [i ] if(x)== list else x)for i in range(6)}
1循环,最好3:881 ms每循环


I have a dataframe (with a DateTime index) , in which some of the columns contain lists, each with 6 elements.

In: dframe.head()
Out: 
                           A                                        B  \
timestamp                                                                
2017-05-01 00:32:25        30  [-3512, 375, -1025, -358, -1296, -4019]   
2017-05-01 00:32:55        30  [-3519, 372, -1026, -361, -1302, -4020]   
2017-05-01 00:33:25        30  [-3514, 371, -1026, -360, -1297, -4018]   
2017-05-01 00:33:55        30  [-3517, 377, -1030, -363, -1293, -4027]   
2017-05-01 00:34:25        30  [-3515, 372, -1033, -361, -1299, -4025]   
                                                      C           D
timestamp                                                             
2017-05-01 00:32:25  [1104, 1643, 625, 1374, 5414, 2066]      49.93   
2017-05-01 00:32:55  [1106, 1643, 622, 1385, 5441, 2074]      49.94   
2017-05-01 00:33:25  [1105, 1643, 623, 1373, 5445, 2074]      49.91   
2017-05-01 00:33:55  [1105, 1646, 620, 1384, 5438, 2076]      49.91   
2017-05-01 00:34:25  [1104, 1645, 613, 1374, 5431, 2082]      49.94   

I have a dictionary dict_of_dfs which I want to populate with 6 dataframes,

dict_of_dfs = {1: df1, 2:df2, 3:df3, 4:df4, 5:df5, 6:df6}

where the ith dataframe contains the ith items from each list, so the first dataframe in the dict will be:

In:df1
Out: 
                            A          B      C        D
    timestamp                                                                
    2017-05-01 00:32:25        30  -3512   1104    49.93
    2017-05-01 00:32:55        30  -3519   1106    49.94
    2017-05-01 00:33:25        30  -3514   1105    49.91
    2017-05-01 00:33:55        30  -3517   1105    49.91
    2017-05-01 00:34:25        30  -3515   1104    49.94

and so-on. The actual dataframe has more columns than this and thousands of rows. What's the simplest, most python way to make the conversion?

解决方案

You can use dict comprehension with assign and for select values of lists use str[0], str[1]:

N = 6
dfs = {i:df.assign(B=df['B'].str[i-1], C=df['C'].str[i-1]) for i in range(1,N + 1)}

print(dfs[1])
             timestamp   A     B     C      D
0  2017-05-01 00:32:25  30 -3512  1104  49.93
1  2017-05-01 00:32:55  30 -3519  1106  49.94
2  2017-05-01 00:33:25  30 -3514  1105  49.91
3  2017-05-01 00:33:55  30 -3517  1105  49.91
4  2017-05-01 00:34:25  30 -3515  1104  49.94

Another solution:

dfs = {i:df.apply(lambda x: x.str[i-1] if type(x.iat[0]) == list else x) for i in range(1,7)}

print(dfs[1])
             timestamp   A     B     C      D
0  2017-05-01 00:32:25  30 -3512  1104  49.93
1  2017-05-01 00:32:55  30 -3519  1106  49.94
2  2017-05-01 00:33:25  30 -3514  1105  49.91
3  2017-05-01 00:33:55  30 -3517  1105  49.91
4  2017-05-01 00:34:25  30 -3515  1104  49.94

Timings:

df = pd.concat([df]*10000).reset_index(drop=True)

In [185]: %timeit {i:df.assign(B=df['B'].str[i-1], C=df['C'].str[i-1]) for i in range(1,N+1)}
1 loop, best of 3: 420 ms per loop

In [186]: %timeit {i:df.apply(lambda x: x.str[i-1] if type(x.iat[0]) == list else x) for i in range(1,7)}
1 loop, best of 3: 447 ms per loop

In [187]: %timeit {(i+1):df.applymap(lambda x: x[i] if type(x) == list else x) for i in range(6)}
1 loop, best of 3: 881 ms per loop

这篇关于将大 pandas 的数据帧转换为数据帧的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆