如何在 Pandas 数据框中拆分一列元组? [英] How can I split a column of tuples in a Pandas dataframe?

查看:46
本文介绍了如何在 Pandas 数据框中拆分一列元组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Pandas 数据框(这只是一小部分)

<预><代码>>>>d1y 范数测试 y 范数训练 len(y_train) len(y_test) 64.904368 116.151232 1645 5491 70.852681 112.639876 1645 549SVR RBF (35.652207342877873, 22.95533537448393)1 (39.563683797747622, 27.382483096332511)轻型商用车 (19.365430594452338, 13.880062435173587)1 (19.099614489458364, 14.018867136617146)里奇简历 (4.2907610988480362, 12.416745648065584)1 (4.18864306788194, 12.980833914392477)射频 (9.9484841581029428, 16.46902345373697)1 (10.139848213735391, 16.282141345406522)国标 (0.012816232716538605, 15.950164822266007)1 (0.012814519804493328, 15.305745202851712)ET数据0 (0.00034337162272515505, 16.284800366214057) j2m1 (0.00024811554516431878, 15.556506191784194) j2m>>>

我想拆分所有包含元组的列.例如,我想将 LCV 列替换为 LCV-aLCV-b 列.

我该怎么做?

解决方案

您可以通过对该列执行 pd.DataFrame(col.tolist()) 来实现:

在 [2]: df = pd.DataFrame({'a':[1,2], 'b':[(1,2), (3,4)]})在 [3] 中:df出[3]:乙0 1 (1, 2)1 2 (3, 4)在 [4]: df['b'].tolist()输出[4]:[(1, 2), (3, 4)]在 [5]: pd.DataFrame(df['b'].tolist(), index=df.index)出[5]:0 10 1 21 3 4在 [6] 中:df[['b1', 'b2']] = pd.DataFrame(df['b'].tolist(), index=df.index)在 [7] 中:df出[7]:a b b1 b20 1 (1, 2) 1 21 2 (3, 4) 3 4

注意:在早期版本中,此答案建议使用 df['b'].apply(pd.Series) 而不是 pd.DataFrame(df['b'].tolist(), index=df.index).这也有效(因为它为每个元组创建一个系列,然后将其视为数据帧的一行),但它比 tolist 版本更慢/使用更多内存,如其他答案在这里(感谢 去 denfromufa).

I have a Pandas dataframe (this is only a little piece)

>>> d1
   y norm test  y norm train  len(y_train)  len(y_test)  
0    64.904368    116.151232          1645          549
1    70.852681    112.639876          1645          549

                                    SVR RBF  
0   (35.652207342877873, 22.95533537448393)
1  (39.563683797747622, 27.382483096332511)

                                        LCV  
0  (19.365430594452338, 13.880062435173587)
1  (19.099614489458364, 14.018867136617146)

                                   RIDGE CV  
0  (4.2907610988480362, 12.416745648065584)
1    (4.18864306788194, 12.980833914392477)

                                         RF  
0   (9.9484841581029428, 16.46902345373697)
1  (10.139848213735391, 16.282141345406522)

                                           GB  
0  (0.012816232716538605, 15.950164822266007)
1  (0.012814519804493328, 15.305745202851712)

                                             ET DATA
0  (0.00034337162272515505, 16.284800366214057)  j2m
1  (0.00024811554516431878, 15.556506191784194)  j2m
>>>

I want to split all the columns that contain tuples. For example, I want to replace the column LCV with the columns LCV-a and LCV-b.

How can I do that?

解决方案

You can do this by doing pd.DataFrame(col.tolist()) on that column:

In [2]: df = pd.DataFrame({'a':[1,2], 'b':[(1,2), (3,4)]})

In [3]: df
Out[3]:
   a       b
0  1  (1, 2)
1  2  (3, 4)

In [4]: df['b'].tolist()
Out[4]: [(1, 2), (3, 4)]

In [5]: pd.DataFrame(df['b'].tolist(), index=df.index)
Out[5]:
   0  1
0  1  2
1  3  4

In [6]: df[['b1', 'b2']] = pd.DataFrame(df['b'].tolist(), index=df.index)

In [7]: df
Out[7]:
   a       b  b1  b2
0  1  (1, 2)   1   2
1  2  (3, 4)   3   4

Note: in an earlier version, this answer recommended to use df['b'].apply(pd.Series) instead of pd.DataFrame(df['b'].tolist(), index=df.index). That works as well (because it makes a Series of each tuple, which is then seen as a row of a dataframe), but it is slower / uses more memory than the tolist version, as noted by the other answers here (thanks to denfromufa).

这篇关于如何在 Pandas 数据框中拆分一列元组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆