使用数据框中的R mutate在表中生成自引用键 [英] Generate self reference key within the table using R mutate in a dataframe

查看:159
本文介绍了使用数据框中的R mutate在表中生成自引用键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含3列的输入表(Person_Id,Visit_Id(每次访问和每个人的唯一ID)和目的),如下所示.我想生成另一个新列,该列提供该人的前一次就诊(例如:如果某人曾以Visit Id = 2来医院就诊,那么我想再创建一个称为"Preceding_visit_Id"的列,该列将为1(例如: 2,如果访问ID = 5,则先前的访问ID将为4).是否可以使用mutate函数以一种优雅的方式做到这一点?

I have an input table with 3 columns (Person_Id, Visit_Id (unique Id for each visit and each person) and Purpose) as shown below. I would like to generate another new column which provides the immediate preceding visit of the person (ex: if person has visited hospital with Visit Id = 2, then I would like to have another column called "Preceding_visit_Id" which will be 1 (ex:2, if visit id = 5, preceding visit id will be 4). Is there a way to do this in a elegant manner using mutate function?

输入表

输出表

如您所见,"Preceding_visit_id"列引用了使用visit_id列定义的人员的上次访问

As you can see that 'Preceding_visit_id' column refers the previous visit of the person which is defined using visit_id column

请注意,这是对大型程序中某一列的转换,因此,任何优雅的做法都将有所帮助.

Please note that this is a transformation for one of the columns in a huge program, so anything elegant would be helpful.

Dput命令输出在这里

Dput command output is here

structure(list(Person_Id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 
3, 3, 3), Visit_Id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14), Purpose = c("checkup", "checkup", "checkup", "checkup", 
"checkup", "checkup", "checkup", "checkup", "checkup", "checkup", 
"checkup", "checkup", "checkup", "checkup"), Preceding_visit_id = c(NA, 
1, 2, 3, 4, NA, 6, 7, 8, 9, 10, NA, 12, 12)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L), spec = 
structure(list(
 cols = list(Person_Id = structure(list(), class = c("collector_double", 
"collector")), Visit_Id = structure(list(), class = c("collector_double", 
"collector")), Purpose = structure(list(), class = 
 c("collector_character", 
"collector")), Preceding_visit_id = structure(list(), class = 
 c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))'''

推荐答案

示例中的Person_Id字段不匹配.

The Person_Id fields in your examples don't match.

我不确定这是否是您要使用的,但是我已经从您的dput()中创建了一个删除最后一列的文件:

I'm not sure if this is what you're after, but from your dput() I have created a file that removes the last column:

df_input <- df_output %>% 
  select(-Preceding_visit_id)

然后执行以下操作:

df_input %>% 
  group_by(Person_Id) %>% 
  mutate(Preceding_visit_id = lag(Visit_Id))

输出为:

# A tibble: 14 x 4
# Groups:   Person_Id [3]
   Person_Id Visit_Id Purpose Preceding_visit_id
       <dbl>    <dbl> <chr>                <dbl>
 1         1        1 checkup                 NA
 2         1        2 checkup                  1
 3         1        3 checkup                  2
 4         1        4 checkup                  3
 5         1        5 checkup                  4
 6         2        6 checkup                 NA
 7         2        7 checkup                  6
 8         2        8 checkup                  7
 9         2        9 checkup                  8
10         2       10 checkup                  9
11         2       11 checkup                 10
12         3       12 checkup                 NA
13         3       13 checkup                 12
14         3       14 checkup                 13

这篇关于使用数据框中的R mutate在表中生成自引用键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆