如何在 Pandas 数据框中拆分一列元组? [英] How can I split a column of tuples in a Pandas dataframe?
问题描述
我有一个 Pandas 数据框(这只是一小部分)
<预><代码>>>>d1y 范数测试 y 范数训练 len(y_train) len(y_test) 64.904368 116.151232 1645 5491 70.852681 112.639876 1645 549SVR RBF (35.652207342877873, 22.95533537448393)1 (39.563683797747622, 27.382483096332511)轻型商用车 (19.365430594452338, 13.880062435173587)1 (19.099614489458364, 14.018867136617146)里奇简历 (4.2907610988480362, 12.416745648065584)1 (4.18864306788194, 12.980833914392477)射频 (9.9484841581029428, 16.46902345373697)1 (10.139848213735391, 16.282141345406522)国标 (0.012816232716538605, 15.950164822266007)1 (0.012814519804493328, 15.305745202851712)ET数据0 (0.00034337162272515505, 16.284800366214057) j2m1 (0.00024811554516431878, 15.556506191784194) j2m>>>我想拆分所有包含元组的列.例如,我想将 LCV
列替换为 LCV-a
和 LCV-b
列.
我该怎么做?
您可以通过对该列执行 pd.DataFrame(col.tolist())
来实现:
在 [2]: df = pd.DataFrame({'a':[1,2], 'b':[(1,2), (3,4)]})在 [3] 中:df出[3]:乙0 1 (1, 2)1 2 (3, 4)在 [4]: df['b'].tolist()输出[4]:[(1, 2), (3, 4)]在 [5]: pd.DataFrame(df['b'].tolist(), index=df.index)出[5]:0 10 1 21 3 4在 [6] 中:df[['b1', 'b2']] = pd.DataFrame(df['b'].tolist(), index=df.index)在 [7] 中:df出[7]:a b b1 b20 1 (1, 2) 1 21 2 (3, 4) 3 4
注意:在早期版本中,此答案建议使用 df['b'].apply(pd.Series)
而不是 pd.DataFrame(df['b'].tolist(), index=df.index)
.这也有效(因为它为每个元组创建一个系列,然后将其视为数据帧的一行),但它比 tolist
版本更慢/使用更多内存,如其他答案在这里(感谢 去 denfromufa).
I have a Pandas dataframe (this is only a little piece)
>>> d1
y norm test y norm train len(y_train) len(y_test)
0 64.904368 116.151232 1645 549
1 70.852681 112.639876 1645 549
SVR RBF
0 (35.652207342877873, 22.95533537448393)
1 (39.563683797747622, 27.382483096332511)
LCV
0 (19.365430594452338, 13.880062435173587)
1 (19.099614489458364, 14.018867136617146)
RIDGE CV
0 (4.2907610988480362, 12.416745648065584)
1 (4.18864306788194, 12.980833914392477)
RF
0 (9.9484841581029428, 16.46902345373697)
1 (10.139848213735391, 16.282141345406522)
GB
0 (0.012816232716538605, 15.950164822266007)
1 (0.012814519804493328, 15.305745202851712)
ET DATA
0 (0.00034337162272515505, 16.284800366214057) j2m
1 (0.00024811554516431878, 15.556506191784194) j2m
>>>
I want to split all the columns that contain tuples. For example, I want to replace the column LCV
with the columns LCV-a
and LCV-b
.
How can I do that?
You can do this by doing pd.DataFrame(col.tolist())
on that column:
In [2]: df = pd.DataFrame({'a':[1,2], 'b':[(1,2), (3,4)]})
In [3]: df
Out[3]:
a b
0 1 (1, 2)
1 2 (3, 4)
In [4]: df['b'].tolist()
Out[4]: [(1, 2), (3, 4)]
In [5]: pd.DataFrame(df['b'].tolist(), index=df.index)
Out[5]:
0 1
0 1 2
1 3 4
In [6]: df[['b1', 'b2']] = pd.DataFrame(df['b'].tolist(), index=df.index)
In [7]: df
Out[7]:
a b b1 b2
0 1 (1, 2) 1 2
1 2 (3, 4) 3 4
Note: in an earlier version, this answer recommended to use df['b'].apply(pd.Series)
instead of pd.DataFrame(df['b'].tolist(), index=df.index)
. That works as well (because it makes a Series of each tuple, which is then seen as a row of a dataframe), but it is slower / uses more memory than the tolist
version, as noted by the other answers here (thanks to denfromufa).
这篇关于如何在 Pandas 数据框中拆分一列元组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!