通过R data.table中的ID删除重复的行,但从另一列添加具有连接日期的新列 [英] Remove duplicated rows by ID in R data.table, but add a new column with the concatenated dates from another column

查看:212
本文介绍了通过R data.table中的ID删除重复的行,但从另一列添加具有连接日期的新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的患者数据数据表。我想删除id重复的行,而不会丢失日期列中的信息。

  id date 
01 2004-07-01
02 NA
03 2013-11 -15
03 2005-03-15
04 NA
05 2011-07-01
05 2012-07-01

我可以用以下两种方法之一 -


  1. 创建一个列,用于写入日期列值以连接该ID的所有日期,例如:

      id date_new 
    01 2004-07-01
    02 NA
    03 2013-11-15; 2005-03-15
    04 NA
    05 2011-07-01; 2012-07-01


>


  1. 为每个额外的日期创建一个新列,例如:

      id date_new date_new2 
    01 2004-07-01 NA
    02 NA NA
    03 2013-11-15 2005-03-15
    04 NA NA
    05 2011-07-01 2012-07-01


我试过几个东西,但他们继续崩溃我的R会话(我得到消息 R会话中止。R遇到一个致命错误。会话终止。):

  setkey(DT,id)
unique_DT <子集(唯一(DT))

和:

  DT [!duplicate(DT [,id,with = FALSE])] 

但是,除了崩溃R之外,这些解决方案都不能满足我想要的日期。



有什么想法吗?我是新的数据表(和R一般),但我有模糊的意义,我可以解决这个与:= 某种方式。

解决方案

尝试:

  dt [,c(date_new = paste (date,collapse =;),。SD),by = id] 


I have a large data table of patient data. I want to delete rows where "id" is duplicated without losing the information in the "date" column.

id  date
01  2004-07-01
02  NA
03  2013-11-15
03  2005-03-15
04  NA
05  2011-07-01
05  2012-07-01

I could do this one of two ways -

  1. create a column that writes over the date column values to concatenate all the dates for that ID, i.e.:

    id  date_new
    01  2004-07-01
    02  NA
    03  2013-11-15; 2005-03-15
    04  NA
    05  2011-07-01; 2012-07-01
    

or

  1. create one new column for each additional date, i.e.:

    id  date_new    date_new2
    01  2004-07-01  NA
    02  NA          NA
    03  2013-11-15  2005-03-15
    04  NA          NA
    05  2011-07-01  2012-07-01
    

I have tried a few things, but they keep crashing my R session (I get the message R Session Aborted. R encountered a fatal error. The session was terminated.):

setkey(DT, "id")
unique_DT <- subset(unique(DT))

and:

DT[!duplicated(DT[, "id", with = FALSE])]

However, besides crashing R, neither of these solutions does what I want with the dates.

Any ideas? I am new to data table (and R generally) but I have the vague sense that I could solve this with := somehow.

解决方案

Try this:

dt[,c(date_new=paste(date,collapse="; "),.SD),by=id]

这篇关于通过R data.table中的ID删除重复的行,但从另一列添加具有连接日期的新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆