在ddply中将ifelse与transform一起使用 [英] Using ifelse with transform in ddply

查看:80
本文介绍了在ddply中将ifelse与transform一起使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将ddplytransform结合使用,以在数据帧中使用变量IDDate填充新变量(summary_Date).变量的值是根据要使用ifelse评估的块的长度来选择的:

I am trying to use ddply with transform to populate a new variable (summary_Date) in a dataframe with variables ID and Date. The value of the variable is chosen based on the length of the piece that is being evaluated using ifelse:

如果给定月份中ID的观察少于五个,我希望通过将日期四舍五入到最近的月份(使用软件包lubridate中的round_date)来计算summary_Date;如果给定月份中ID的观察值超过五个,则我希望summary_Date只是Date.

If there are less than five observations for an ID in a given month, I want to have summary_Date be calculated by rounding the date to the nearest month (using round_date from package lubridate); if there are more than five observations for an ID in a given month, I want the summary_Date to simply be Date.

require(plyr)
require(lubridate)

test.df <- structure(
  list(ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1
                , 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,2, 2, 2, 2, 2, 2, 2, 2)
       , Date = structure(c(-247320000, -246196800, -245073600, -243864000
                            , -242654400, -241444800, -126273600, -123595200
                            , -121176000, -118497600, 1359385200, 1359388800
                            , 1359392400, 1359396000, 1359399600, 1359403200
                            , 1359406800, 1359410400, 1359414000, 1359417600
                            , 55598400, 56116800, 58881600, 62078400, 64756800
                            , 67348800, 69854400, 72964800, 76161600, 79012800
                            , 1358589600, 1358676000, 1358762400, 1358848800
                            , 1358935200, 1359021600, 1359108000, 1359194400
                            , 1359280800, 1359367200), tzone = "GMT"
                          , class = c("POSIXct", "POSIXt"))
       , Val=rnorm(40))
  , .Names = c("ID", "Date", "Val"), row.names = c(NA, 40L)
  , class = "data.frame")

test.df <- ddply(test.df, .(ID, floor_date(Date, "month")), transform
                 , summary_Date=as.POSIXct(ifelse(length(ID)<5
                                                  , round_date(Date, "month")
                                                  ,Date)
                                           , origin="1970-01-01 00:00.00"
                                           , tz="GMT")
                 # Included length_x to easily see the length of the subset
                 , length_x = length(ID))

head(test.df,5)
#   floor_date(Date, "month") ID                Date        Val summary_Date length_x
# 1                1962-03-01  1 1962-03-01 12:00:00 -0.1037988   1962-03-01        3
# 2                1962-03-01  1 1962-03-14 12:00:00  0.2923056   1962-03-01        3
# 3                1962-03-01  1 1962-03-27 12:00:00  0.4435410   1962-03-01        3
# 4                1962-04-01  1 1962-04-10 12:00:00  0.1159164   1962-04-01        2
# 5                1962-04-01  1 1962-04-24 12:00:00  2.9824075   1962-04-01        2

ifelse语句似乎正在运行,但是"summary_Date"中的值似乎是为正在处理转换的子集计算的第一个值,而不是特定于行的值.例如,在第3行中,summary_Date应该为1962-04-01,因为日期1962-03-27 12:00:00'应该四舍五入(因为子集中的行数少于5),而是summary_Date的第一个计算值()在该子集中的所有行中重复.

The ifelse statement seems to be working, but the value in 'summary_Date' seems to be the first value calculated for the subset that transform is working on, rather than the row-specific value. For example in row 3, summary_Date should be 1962-04-01 because the date 1962-03-27 12:00:00' should be rounded up (because there are fewer than five rows in the subset), but instead the first calculated value of summary_Date (1962-03-01) is repeated in all rows in that subset.

编辑:里卡多的答案激发了我的灵感,他使用data.tableddply中分两步进行尝试.它也可以工作:

I was inspired by Ricardo's answer using data.table to try it in two steps with ddply. It works also:

test.df <- ddply(test.df, .(ID, floor_date(Date, "month")), transform
                 , length_x = length(ID))

test.df <- ddply(test.df, .(ID, floor_date(Date, "month")), transform
                 , summary_Date=as.POSIXct(ifelse(length_x<5
                                                  , round_date(Date, "month")
                                                  ,Date)
                                           , origin="1970-01-01 00:00.00"
                                           , tz="GMT"))

head(test.df,5)[c(1,3:7)]
#   floor_date(Date, "month") ID                Date        Val length_x summary_Date
# 1                1962-03-01  1 1962-03-01 12:00:00 -0.1711212        3   1962-03-01
# 2                1962-03-01  1 1962-03-14 12:00:00 -0.1531571        3   1962-03-01
# 3                1962-03-01  1 1962-03-27 12:00:00  0.1256238        3   1962-04-01
# 4                1962-04-01  1 1962-04-10 12:00:00  1.4481225        2   1962-04-01
# 5                1962-04-01  1 1962-04-24 12:00:00 -0.6508731        2   1962-05-01

推荐答案

一步式ddply解决方案(也已发布为评论)

One Step ddply solution (also posted as comment)

ddply(test.df, .(ID, floor_date(Date, "month")), mutate, 
  length_x = length(ID), 
  summary_Date=as.POSIXct(ifelse(length_x < 5, round_date(Date, "month") ,Date)
    , origin="1970-01-01 00:00.00", tz="GMT")
)

这篇关于在ddply中将ifelse与transform一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆