如何最好地将包含列表或元组的Pandas列提取到多列中 [英] How best to extract a Pandas column containing lists or tuples into multiple columns
问题描述
我不小心关闭了这个问题,并提供了指向错误副本的链接.这是正确的方法: Pandas将列表的列分为多列.
I accidentally closed this question with a link to the wrong duplicate. Here is the correct one: Pandas split column of lists into multiple columns.
假设我有一个数据框,其中的一列是一个列表(长度相同且已知)或元组,例如:
Suppose I have a dataframe of which one column is a list (of a known and identical length) or tuple, for example:
df1 = pd.DataFrame(
{'vals': [['a', 'b', 'c', 'd'],['e','f','g','h']]}
)
即:
vals
0 [a, b, c, d]
1 [e, f, g, h]
我想将"vals"中的值添加到单独的命名列中.我可以通过遍历行来笨拙地做到这一点:
I want to extra the values in "vals" into separate named columns. I can do this clumsily by iterating through the rows:
for i in range(df1.shape[0]):
for j in range(0,4):
df1.loc[i, 'vals_'+j] = df1.loc[i, 'vals'] [j]
所需结果:
vals vals_0 vals_1 vals_2 vals_3
0 [a, b, c, d] a b c d
1 [e, f, g, h] e f g h
是否有更整洁(矢量化)的方式?我尝试使用[],但出现错误.
Is there a neater (vectorised) way? I tried using [] but I get an error.
for j in range (0,4)
df1['vals_'+str(j)] = df1['vals'][j]
给予:
ValueError: Length of values does not match length of index
熊猫似乎正在尝试将[]运算符应用于系列/数据框而不是列内容.
It looks like Pandas is trying to apply the [] operator to the series/dataframe rather than the column content.
推荐答案
您可以将assign
,apply
与pd.Series
一起使用:
You can use assign
, apply
, with pd.Series
:
df1.assign(**df1.vals.apply(pd.Series).add_prefix('val_'))
获取更多数据的更快方法是将.values和tolist()与数据帧构造函数一起使用:
A faster method for more data is to use .values and tolist() with dataframe constructor:
df1.assign(**pd.DataFrame(df1.vals.values.tolist()).add_prefix('val_'))
输出:
vals val_0 val_1 val_2 val_3
0 [a, b, c, d] a b c d
1 [e, f, g, h] e f g h
这篇关于如何最好地将包含列表或元组的Pandas列提取到多列中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!