检查重复,求和,并在求和后删除一行 [英] Checking duplicates, sum them and delete one row after summing

查看:157
本文介绍了检查重复,求和,并在求和后删除一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含一些重复的数据框。我想要总结两行的行,其中有一个重复的,然后删除不需要的行。



这里是一个数据示例,

 年份ID Lats Longs N n c_id 
2015 200 30.5417 -20.5254 150 30 4142
2015 200 30.5417 -20.5254 90 50 4142 $ b我想要把N和n列和一行排列在一起。$ b

剩余的信息,例如Lats,Longs,ID和Year将保持不变,例如,

 年份ID Lats Long N n c_id 
2015 200 30.5417 -20.5254 240 80 4142


解决方案

p>解决方案使用 data.table

  require(data.table )
df< - structure(list(year = c(2015,2015),ID = c(200,200),Lats = c(30.5417,
30.5417),Longs = c(-20.5254 ,-20.5254),N = c(150,90),n = c(30,
50),c_id = c(4142,4412)),Names = c(year Lats,
Longs,N,n,c_id),row.names = c(NA,-2L),
class =data.frame b $ b dt< - data.table(df)
dt [,lapply(.SD,sum),by =c_id,year,ID,Lats,Longs]

c_id year ID Lats Longs N n
1:4142 2015 200 30.5417 -20.5254 240 80

解决方案使用 plyr

  require(plyr)
ddply(df,。(c_id,year,ID,Lats,Longs),function(x)c(N = sum(x $ N),n = sum (x $ n)))

c_id年份ID Lats Longs N n
1 4142 2015 200 30.5417 -20.5254 240 80


I have a dataframe which contains some duplicates. I want to sum rows of two columns where there is a duplicate and then delete the unwanted row.

Here is an example of the data,

Year    ID  Lats     Longs      N   n   c_id
2015    200 30.5417 -20.5254    150 30  4142
2015    200 30.5417 -20.5254    90  50  4142

I want to sum columns N and n into one row. the rest of the information i.e. Lats , Longs , ID and Year is to remain the same e.g.,

Year    ID  Lats    Long        N   n   c_id
2015    200 30.5417 -20.5254    240 80  4142

解决方案

Solution using data.table:

require(data.table)
df <- structure(list(year = c(2015, 2015), ID = c(200, 200), Lats = c(30.5417, 
            30.5417), Longs = c(-20.5254, -20.5254), N = c(150, 90), n = c(30, 
            50), c_id = c(4142, 4142)), .Names = c("year", "ID", "Lats", 
            "Longs", "N", "n", "c_id"), row.names = c(NA, -2L), 
            class = "data.frame")
dt <- data.table(df)
dt[, lapply(.SD, sum), by="c_id,year,ID,Lats,Longs"]

   c_id year  ID    Lats    Longs   N  n
1: 4142 2015 200 30.5417 -20.5254  240 80

Solution using plyr:

require(plyr)
ddply(df, .(c_id, year, ID, Lats, Longs), function(x) c(N=sum(x$N), n=sum(x$n)))

  c_id year  ID    Lats    Longs   N  n
1 4142 2015 200 30.5417 -20.5254 240 80

这篇关于检查重复,求和,并在求和后删除一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆