长格式到宽格式,有多个重复项.使用独特的柱组合来规避 [英] Long to wide format with several duplicates. Circumvent with unique combo of columns

查看:43
本文介绍了长格式到宽格式,有多个重复项.使用独特的柱组合来规避的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个与此类似的数据集(真正的数据集更大).它是长格式,我需要将其更改为宽格式,每个 ID 一行.我的问题是时间、药物、单位和管理员有很多不同的组合.只有 time、drug、unit 和 admin 的组合是唯一的,并且应该只出现一次.我找不到解决方案.我希望 R 创建独特的列组合,以便将数据转换为宽格式.我试过了

I have a dataset similar to this (real one is way bigger). It is in long format and I need to change it to wide format with one row per id. My problem is that there are a lot of different combinations of time, drug, unit and admin. Only a combination of time, drug, unit and admin will be unique and should only occur once pr id. I could not find a solution to this. I would like R to create unique combinations of columns so the data can be transformed to wide format. I have tried

melt.data.table(df, id.vars=c(id,time,drug,unit,admin), measure.vars = c(dose), na.rm=F)

以及与

%>% expand(nesting(time, drug, unit, admin, dose), id)

但它不起作用.这是模拟数据:

but it doesn't work. Here is mock data:

id<-c(1492,1492,1492,1492,1493,1493)
time<-c("Pre-bypass","Post-bypass","Total","Post-bypass","Pre-OP","Pre-OP")
drug<-c("ACE","LEVO","LEVO","MIL","BB","BC")
unit<-c(NA,"ml/hr","ml","mg",NA,NA)
admin<-c(NA, "IV","IV","Inhale",NA,NA)
dose<-c(NA,50,40,5,NA,NA)
df<-rbind(id,time,drug,unit,admin,dose)
df<-t(df)
df<-as.data.table(df)

我希望我的输出是这样的(Pre.bypass.Ace.unitNA.adminNA 和 Pre.OP 列中为 TRUE 的原因是这里缺少剂量和单位,但因为它被列出,所以给出标准剂量和单位:

I would like my output to be something like this (the reason for the TRUE in Pre.bypass.Ace.unitNA.adminNA and Pre.OP columns is that dose and unit is missing here but because it is listed it is given in standard dose and unit:

id.new<-c(1492,1493)
Pre.OP.BB.unitNA.adminNA<-c(NA,TRUE)
Pre.OP.BC.unitNA.adminNA<-c(NA,TRUE)
Total.LEVO.ml.h.IV<-c(40,NA)
Pre.bypass.Ace.unitNA.adminNA<-c(TRUE,NA)
Post.bypass.LEVO.ml.h.IV<-c(50,NA)
Post.bypass.MIL.ml.h.IV<-c(5,NA)
df.new<-rbind(id.new,Post.bypass.MIL.ml.h.IV,Pre.OP.BB.unitNA.adminNA,Pre.OP.BC.unitNA.adminNA,Total.LEVO.ml.h.IV,Pre.bypass.Ace.unitNA.adminNA,Post.bypass.LEVO.ml.h.IV)
df.new<-t(df.new)

推荐答案

我同意长格式通常是更好的方法的评论.如果您必须使用 tidyr 包使用宽格式,您可以执行以下操作:

I agree with the comments that long format is usually the better way to go. If you have to use wide format the using the tidyr package you can do the following:

library(tidyr)
df %>% 
  unite(combination, time, drug, unit, admin) %>% 
  spread(key = combination, value  = dose)

这篇关于长格式到宽格式,有多个重复项.使用独特的柱组合来规避的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆