合并以相同字母开头的pandas DataFrame列 [英] Merge pandas DataFrame columns starting with the same letters
问题描述
假设我有一个DataFrame
:
>>> df = pd.DataFrame({'a1':[1,2],'a2':[3,4],'b1':[5,6],'b2':[7,8],'c':[9,0]})
>>> df
a1 a2 b1 b2 c
0 1 3 5 7 9
1 2 4 6 8 0
>>>
我想合并(也许不是合并,而是将其名称的第一个字母相等的列合并,例如a1
和a2
以及其他列),但是正如我们所看到的,有一个NaN
.
And I want to merge (maybe not merge, but concatenate) the columns where their name's first letter are equal, such as a1
and a2
and others... but as we see, there is a c
column which is by itself without any other similar ones, therefore I want them to not throw errors, instead add NaN
s to them.
我想以某种方式合并,它将宽范围的DataFrame
更改为长范围的DataFrame
,基本上就像宽范围到长范围的修改一样.
I want to merge in a way that it will change a wide DataFrame
into a long DataFrame
, basically like a wide to long modification.
我已经有解决问题的方法,但是唯一的问题是它的效率很低,我想要一个更高效,更快速的解决方案(与我的:P不同),我目前有一个for
循环和一个try
except
(嗯,听起来已经很糟糕了),例如:
I already have a solution to the problem, but only thing is that it's very inefficient, I would like a more efficient and faster solution (unlike mine :P), I currently have a for
loop and a try
except
(ugh, sounds bad already) code such as:
>>> df2 = pd.DataFrame()
>>> for i in df.columns.str[:1].unique():
try:
df2[i] = df[[x for x in df.columns if x[:1] == i]].values.flatten()
except:
l = df[[x for x in df.columns if x[:1] == i]].values.flatten().tolist()
df2[i] = l + [pd.np.nan] * (len(df2) - len(l))
>>> df2
a b c
0 1 5 9.0
1 3 7 0.0
2 2 6 NaN
3 4 8 NaN
>>>
我想用更好的代码获得相同的结果.
I would like to obtain the same results with better code.
推荐答案
使用字典理解:
df = pd.DataFrame({i: pd.Series(x.to_numpy().ravel())
for i, x in df.groupby(lambda x: x[0], axis=1)})
print (df)
a b c
0 1 5 9.0
1 3 7 0.0
2 2 6 NaN
3 4 8 NaN
这篇关于合并以相同字母开头的pandas DataFrame列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!