找到一列的重复行,然后添加其他列的对应行 [英] Find the duplicate rows of one column then add the corresponding rows of other columns

查看:73
本文介绍了找到一列的重复行,然后添加其他列的对应行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想检查一列的重复行,并添加其他列的对应行. 如果日期范围如下:

I want to check the duplicate rows of one column and add the corresponding rows of other columns. If the dateframe is as follows:

 A     B         C         D  E  F  G  
13348  xyzqr     324580    1  1  1  1
13348  grpret    325810    4  4  4  4
45832  gberthh   258729    2  1  3  4
45832  bhdrffe   025892    2  1  1  4
58712  bgrtw     984562    2  2  2  2
76493  hzrt      638495    1  1  1  2
643509 .         T648501   1  1  1  1
643509 .         R648501   1  1  1  1

之后,添加列的相应行(B,C,D,E,F)必须检查相应行的所有列是否相等.如果相等,则将相同的数字赋予其他列,否则为'0'.
对于上面的示例,由于有两个重复的行13348,并且相加的行为每列(D = 5,E = 5,F = 5,G = 5)给出5,因此输出应为5. ,因为45832在添加后具有不同的列(D = 4,E = 2,F = 4,G = 8)值,因此输出应为0.
剩余的列应为重复行的最小索引值.

After, adding the corresponding rows of columns (B, C, D, E, F) must check if all the columns of corresponding rows are equal. If equal give the same number to other column else '0'.
For the above example, since there are two duplicate rows of 13348 and addition of corresponding rows give the 5 for every column of (D=5, E=5, F=5, G=5) so the output should be 5. Whereas, for the 45832 has different column (D=4, E=2, F=4, G=8) values after adding so the output should be 0.
Remaining column should be the minimum index values of the duplicate rows.

输出应如下:

   A     B       C       D 
  13348  xyzqr   324580  5
  45832  gberthh 258729  0
  58712  bgrtw   984562  2
  76493  hzrt    638495  0
  643509 .       T648501 2

我很高兴知道一些建议.

I am pleased to know some suggestions.

推荐答案

我认为需要:

cols = ['D','E','F','G']
#for each group transpose df and check if all duplicates
df1 = df.groupby('A')[cols].apply(lambda x: x.T.duplicated(keep=False))
#for duplicates aggregate sum else 0
arr = np.where(df1.all(axis=1), df.groupby('A')[cols[0]].sum(), 0)
#remove unnecessary columns and add new, get first rows per column A
df = df.drop(cols, axis=1).drop_duplicates('A').assign(D=arr)
print (df)
        A        B        C  D
0   13348    xyzqr   324580  5
2   45832  gberthh   258729  0
4   58712    bgrtw   984562  2
5   76493     hzrt   638495  0
6  643509        .  T648501  2

检查所有组是否都是重复值的替代解决方案:

Alternative solution with check each group if all values are dupes:

cols = ['D','E','F','G']
m = df.groupby('A')[cols].apply(lambda x: x.T.duplicated(keep=False).all())
print (m)
A
13348     True
45832    False
dtype: bool

arr = np.where(m, df.groupby('A')[cols[0]].sum(), 0)
df = df.drop(cols, axis=1).drop_duplicates('A').assign(D=arr)
print (df)
        A        B        C  D
0   13348    xyzqr   324580  5
2   45832  gberthh   258729  0
4   58712    bgrtw   984562  2
5   76493     hzrt   638495  0
6  643509        .  T648501  2

这篇关于找到一列的重复行,然后添加其他列的对应行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆