pandas 将重复的值重新堆叠到列中 [英] Pandas restacking repeated values to columns
问题描述
下面的DataFrame需要重新堆叠,以便我在一行上具有每个区域的所有值.在下面的示例中,新df仅包含3条线,每个区域一条.然后,相应的值将沿多列扩展.
The below DataFrame needs to be restacked, so that I have all values for each region on one line. In the below example the new df would only have 3 lines, one for each region. The corresponding values would then expand along multiple columns.
地区可能有所不同,可能会超过3个.任何建议,我们将不胜感激.
The regions may vary, and there may be more than 3. Any suggestions are appreciated.
>>> a
Out[26]:
Area value
0 EUROPE 47
1 ASIA 51
2 AMERICAS 37
3 EUROPE 39
4 ASIA 22
5 AMERICAS 24
所需的输出:
Europe 47 39
Asia 51 22
Americas 37 24
值应分布在不同的列中
推荐答案
您可以这将处理可变数量的值
如果要拆分值,可以调用apply
并传递pd.Series
ctor:
If you want to split the values out you can call apply
and pass pd.Series
ctor:
In [90]:
df1 = df.groupby('Area')['value'].apply(lambda x: list(x)).reset_index()
df1[['val1', 'val2']] = df1['value'].apply(pd.Series)
df1
Out[90]:
Area value val1 val2
0 AMERICAS [37, 24] 37 24
1 ASIA [51, 22] 51 22
2 EUROPE [47, 39] 47 39
编辑
对于可变数量的列,如果您不知道最大数量是多少,则不能预先分配,但仍可以使用上面的值:
For a variable number of columns you can't assign upfront if you don't know what the max number of values will be but you can still use the above:
In [94]:
import io
import pandas as pd
t="""index Area value
0 EUROPE 47
1 ASIA 51
2 AMERICAS 37
3 EUROPE 39
4 ASIA 22
5 AMERICAS 24
5 AMERICAS 50"""
df = pd.read_csv(io.StringIO(t), sep='\s+')
df
Out[94]:
index Area value
0 0 EUROPE 47
1 1 ASIA 51
2 2 AMERICAS 37
3 3 EUROPE 39
4 4 ASIA 22
5 5 AMERICAS 24
6 5 AMERICAS 50
In [99]:
df1 = df.groupby('Area')['value'].apply(list).reset_index()
df1
Out[99]:
Area value
0 AMERICAS [37, 24, 50]
1 ASIA [51, 22]
2 EUROPE [47, 39]
In [102]:
df1 = pd.concat([df1, df1['value'].apply(pd.Series).fillna(0)], axis=1)
df1
Out[102]:
Area value 0 1 2
0 AMERICAS [37, 24, 50] 37 24 50
1 ASIA [51, 22] 51 22 0
2 EUROPE [47, 39] 47 39 0
这篇关于 pandas 将重复的值重新堆叠到列中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!