通过列操作合并 pandas 数据框 [英] Merge pandas dataframe, with column operation

查看:70
本文介绍了通过列操作合并 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我搜索了档案,但没有找到我想要的(可能是因为我真的不知道要使用哪些关键字)

I searched archive, but did not find what I wanted (probably because I don't really know what key words to use)

这是我的问题:我有一堆数据帧需要合并;我还想使用数据框架中的总和来更新列子集的值.

Here is my problem: I have a bunch of dataframes need to be merged; I also want to update the values of a subset of columns with the sum across the dataframes.

例如,我有两个数据帧df1和df2:

For example, I have two dataframes, df1 and df2:

df1=pd.DataFrame([ [1,2],[1,3], [0,4]], columns=["a", "b"])
df2=pd.DataFrame([ [1,6],[1,4]], columns=["a", "b"])

    a   b           a   b
0   1   2       0   1   5
1   1   3       2   0   6
2   0   4       

合并后,我想用匹配记录的总和来更新列'b',而列'a'应该像以前一样像df1(或df2,不在乎):

after merging, I'd like to have the column 'b' updated with the sum of matched records, while column 'a' should be just like df1 (or df2, don't really care) as before:

    a   b
0   1   7
1   1   3
2   0   10

现在,将其扩展为合并三个或更多数据帧.

Now, expand this to merging three or more data frames.

是否有直接的内置技巧可以做到这一点?还是我需要一个接一个地逐行处理?

Are there straightforward, build-in tricks to do this? or I need to process one by one, line by line?

=====编辑/澄清=====

===== Edit / Clarification =====

在实际示例中,每个数据帧可能包含不在其他数据帧中的索引.在这种情况下,合并的数据帧应全部包含它们,并使用sum(或其他某种操作)更新共享的条目/索引.

In the real world example, each data frame may contain indexes that are not in the other data frames. In this case, the merged data frame should have all of them and update the shared entries/indexes with sum (or some other operation).

推荐答案

仅部分解决方案,尚未完成.但是要点已经解决:

Only partial, not complete solution yet. But the main point is solved:

df3 = pd.concat([df1, df2], join = "outer", axis=1)
df4 = df3.b.sum(axis=1)

df3将具有两个"a"列和两个"b"列. df3.b上的sum()函数添加两个"b"列,并忽略NaN.现在df4拥有列"b",其中包含df1和df2的"b"列以及所有索引的总和.

df3 will have two 'a' columns, and two 'b' columns. the sum() function on df3.b add two 'b' columns and ignore NaNs. Now df4 has column 'b' with sum of df1 and df2's 'b' columns, and all the indexes.

虽然没有解决列"a".在我的真实情况下,df3.a中的NaN数量很少,而df3.a中的其他NaN应该相同.我还没有找到在df4中创建列"a"并用非NaN填充值的直接方法.现在搜索计数"函数以获取df3.a行中元素的出现(假设它具有几十个"a"列).

did not solve the column 'a' though. In my real case, there are quite few number of NaN in df3.a , while others in df3.a should be the same. I haven't found a straightforward way to make a column 'a' in df4 and fill value with non-NaN. Now searching for a "count" function to get occurance of elements in rows of df3.a (imagine it has a few dozens column 'a').

这篇关于通过列操作合并 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆