聚合在R中 [英] Aggregating in R

查看:145
本文介绍了聚合在R中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个有两列的数据框。我想添加另外两列到数据集,其计数基于聚合。

I have a data frame with two columns. I want to add an additional two columns to the data set with counts based on aggregates.

df <- structure(list(ID = c(1045937900, 1045937900), 
SMS.Type = c("DF1", "WCB14"), 
SMS.Date = c("12/02/2015 19:51", "13/02/2015 08:38"), 
Reply.Date = c("", "13/02/2015 09:52")
), row.names = 4286:4287, class = "data.frame")

我想简单地计算没有null的SMS.Type和Reply.Date的Instances实例数。所以在下面的玩具示例中,我将生成2为SMS.Type和1为Reply.Date

I want to simply count the number of Instances of SMS.Type and Reply.Date where there is no null. So in the toy example below, i will generate the 2 for SMS.Type and 1 for Reply.Date

然后我想添加到数据框架作为总数(我知道他们将复制原始数据集中的行数,但是确定)

I then want to add this to the data frame as total counts (Im aware they will duplicate out for the number of rows in the original dataset but thats ok)

我一直在玩累计和计数功能,但没有效果

I have been playing around with aggregate and count function but to no avail

mytempdf <-aggregate(cbind(testtrain$SMS.Type,testtrain$Response.option)~testtrain$ID,
                  train, 
                  function(x) length(unique(which(!is.na(x)))))

mytempdf <- aggregate(testtrain$Reply.Date~testtrain$ID,
                  testtrain, 
                  function(x) length(which(!is.na(x))))

$

感谢您的时间

推荐答案

使用 data.table 你可以做(​​我添加了一个真实的 NA 到您的原始数据)。
我也不确定你是否真的在寻找 length(unique())或只是 length

Using data.table you could do (I've added a real NA to your original data). I'm also not sure if you really looking for length(unique()) or just length?

library(data.table)
cols <- c("SMS.Type", "Reply.Date")
setDT(df)[, paste0(cols, ".count") := 
                  lapply(.SD, function(x) length(unique(na.omit(x)))), 
                  .SDcols = cols, 
            by = ID]
#            ID SMS.Type         SMS.Date       Reply.Date SMS.Type.count Reply.Date.count
# 1: 1045937900      DF1 12/02/2015 19:51               NA              2                1
# 2: 1045937900    WCB14 13/02/2015 08:38 13/02/2015 09:52              2                1

在devel版本中(v> = 1.9.5),你也可以使用 uniqueN

In the devel version (v >= 1.9.5) you also could use uniqueN function

说明

是一个通用的解决方案,可以在任何数量的所需列上工作。所有你需要做的是将列名称放入 cols

This is a general solution which will work on any number of desired columns. All you need to do is to put the columns names into cols.


  1. lapply(.SD,正在调用 .SDcols = cols

  2. paste0(cols,.count) cols

  3. count > := :通过引用执行赋值,意味着使用 lapply
  1. lapply(.SD, is calling a certain function over the columns specified in .SDcols = cols
  2. paste0(cols, ".count") creates new column names while adding count to the column names specified in cols
  3. := performs assignment by reference, meaning, updates the newly created columns with the output of lapply(.SD, in place
  4. by argument is specifying the aggregator columns

这篇关于聚合在R中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆