创建新列,在每行 R 上添加 30 天的日期 [英] Make new column which adds 30 days to date on every row R

查看:42
本文介绍了创建新列,在每行 R 上添加 30 天的日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有两个变量、名称和日期的 df.我想创建一个新列 (new_dates),它采用属于每个人的第一个日期(每个人在此列中应该只有一个重复日期),并在行下降时为每个日期添加 30 天.

I have a df with two variables, names and dates. I would like to create a new column (new_dates) which takes the first date belonging to each person (each person should have just one repeated date in this column) and add 30 days to each date as the rows descend.

所需的输出如下.所以每个人的 row1 保存原始日期,row2 保存 row1+30,row3 保存 row2+30,依此类推.

Desired output is below. So row1 for each person holds the original date, row2 holds row1+30, row3 holds row2+30 and so on.

dff
   names      dates  new_dates
1   john 2010-06-01 2010-06-01
2   john 2010-06-01 2010-07-01
3   john 2010-06-01 2010-07-31
4   john 2010-06-01 2010-08-30
5   mary 2010-07-09 2010-07-09
6   mary 2010-07-09 2010-08-08
7   mary 2010-07-09 2010-09-07
8   mary 2010-07-09 2010-10-07
9    tom 2010-06-01 2010-06-01
10   tom 2010-06-01 2010-07-01
11   tom 2010-06-01 2010-07-31
12   tom 2010-06-01 2010-08-30

我想我可以为此使用变换.这是我的尝试 - 但对我来说并不完全有效.

I thought I could use transform for this. Here is my attempt at it - but it doesn't quite work for me.

dt <- transform(df, new_date = c(dates[2]+30, NA))

推荐答案

data.table 使这变得容易.一旦转换为数据表,它基本上就是一个命令.您的版本遇到的主要问题是您需要先按名称拆分数据,这样您才能获得每个人的最短日期,然后为每个日期添加适当的 30 天倍数.

data.table makes this easy. Once you convert to a data table, it's basically one command. The main problem you're having with your version is that you need to split the data by name first, so you can get the minimum date for each person, and then add the appropriate mutiple of 30 days to each date.

library(data.table)
df$dates <- as.Date(df$dates)
dt <- as.data.table(df)
dt[, 
   list(dates, new_dates=min(dates) + 0:(length(dates) - 1L) * 30), 
   by=names
]
#     names      dates  new_dates
#  1:  john 2010-06-01 2010-06-01
#  2:  john 2010-06-01 2010-07-01
#  3:  john 2010-06-01 2010-07-31
#  4:  john 2010-06-01 2010-08-30
#  5:  mary 2010-07-09 2010-07-09
#  6:  mary 2010-07-09 2010-08-08
#  7:  mary 2010-07-09 2010-09-07
#  8:  mary 2010-07-09 2010-10-07
#  9:   tom 2010-06-01 2010-06-01
# 10:   tom 2010-06-01 2010-07-01
# 11:   tom 2010-06-01 2010-07-31
# 12:   tom 2010-06-01 2010-08-30

<小时>

这是一个版本,希望能说明为什么你的不起作用.我仍然更喜欢 data.table,但希望因为这基本上与您所做的非常接近,所以它可以明确您需要更改的内容:


here is a version that hopefully shows why yours didn't work. I still prefer data.table, but hopefully since this is basically very close to what you were doing it makes it clear what you need to change:

re_date <- function(df) {
  transform(
    df[order(df$dates), ], 
    new_dates=min(dates) + 30 * 0:(length(dates) - 1L)
) }
do.call(rbind, lapply(split(df, df$name), re_date))

从底线 (do.call...) 开始,split 调用创建一个包含三个数据框的列表,一个是 John 的值,一个是给玛丽的,还有给汤姆的.lapply 然后通过 re_date 函数运行每个数据帧,该函数添加 new_dates 列,最后是 do.call/rbind 将其重新拼接成一个数据帧.

Starting with the bottom line (do.call...), the split call makes a list with three data frames, one with the values for John, one for those for Mary, and one for those for Tom. The lapply then runs each of those data frames through the re_date function, which adds the new_dates column, and finally, the do.call/rbind stitches it back together into one data frame.

这篇关于创建新列,在每行 R 上添加 30 天的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆