找到一列的重复行,然后添加其他列的对应行 [英] Find the duplicate rows of one column then add the corresponding rows of other columns
问题描述
我想检查一列的重复行,并添加其他列的对应行. 如果日期范围如下:
I want to check the duplicate rows of one column and add the corresponding rows of other columns. If the dateframe is as follows:
A B C D E F G
13348 xyzqr 324580 1 1 1 1
13348 grpret 325810 4 4 4 4
45832 gberthh 258729 2 1 3 4
45832 bhdrffe 025892 2 1 1 4
58712 bgrtw 984562 2 2 2 2
76493 hzrt 638495 1 1 1 2
643509 . T648501 1 1 1 1
643509 . R648501 1 1 1 1
之后,添加列的相应行(B,C,D,E,F)必须检查相应行的所有列是否相等.如果相等,则将相同的数字赋予其他列,否则为'0'.
对于上面的示例,由于有两个重复的行13348,并且相加的行为每列(D = 5,E = 5,F = 5,G = 5)给出5,因此输出应为5. ,因为45832在添加后具有不同的列(D = 4,E = 2,F = 4,G = 8)值,因此输出应为0.
剩余的列应为重复行的最小索引值.
After, adding the corresponding rows of columns (B, C, D, E, F) must check if all the columns of corresponding rows are equal. If equal give the same number to other column else '0'.
For the above example, since there are two duplicate rows of 13348 and addition of corresponding rows give the 5 for every column of (D=5, E=5, F=5, G=5) so the output should be 5. Whereas, for the 45832 has different column (D=4, E=2, F=4, G=8) values after adding so the output should be 0.
Remaining column should be the minimum index values of the duplicate rows.
输出应如下:
A B C D
13348 xyzqr 324580 5
45832 gberthh 258729 0
58712 bgrtw 984562 2
76493 hzrt 638495 0
643509 . T648501 2
我很高兴知道一些建议.
I am pleased to know some suggestions.
推荐答案
我认为需要:
cols = ['D','E','F','G']
#for each group transpose df and check if all duplicates
df1 = df.groupby('A')[cols].apply(lambda x: x.T.duplicated(keep=False))
#for duplicates aggregate sum else 0
arr = np.where(df1.all(axis=1), df.groupby('A')[cols[0]].sum(), 0)
#remove unnecessary columns and add new, get first rows per column A
df = df.drop(cols, axis=1).drop_duplicates('A').assign(D=arr)
print (df)
A B C D
0 13348 xyzqr 324580 5
2 45832 gberthh 258729 0
4 58712 bgrtw 984562 2
5 76493 hzrt 638495 0
6 643509 . T648501 2
检查所有组是否都是重复值的替代解决方案:
Alternative solution with check each group if all values are dupes:
cols = ['D','E','F','G']
m = df.groupby('A')[cols].apply(lambda x: x.T.duplicated(keep=False).all())
print (m)
A
13348 True
45832 False
dtype: bool
arr = np.where(m, df.groupby('A')[cols[0]].sum(), 0)
df = df.drop(cols, axis=1).drop_duplicates('A').assign(D=arr)
print (df)
A B C D
0 13348 xyzqr 324580 5
2 45832 gberthh 258729 0
4 58712 bgrtw 984562 2
5 76493 hzrt 638495 0
6 643509 . T648501 2
这篇关于找到一列的重复行,然后添加其他列的对应行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!