最后一次观察结果在多个柱上进行 [英] Last observation carried forward by group over multiple columns

查看:134
本文介绍了最后一次观察结果在多个柱上进行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,观察多个患者及其随时间的诊断。有9个不同的虚拟变量,每个代表一个特定的诊断,命名为例如。 L40,L41,K50,M05等。

I have a dataset with observations of multiple patients and their diagnoses over time. There are 9 different dummy variables, each representing a specific diagnosis, named e.g. L40, L41, K50, M05 and so on.

如果虚拟变量中有缺失值,我想结束患者的最后一个非缺失值,使得一旦患者接收到诊断,其将遵循到随后的观察。

Where there are missing values in the dummy variables, I want to carry forward the last non-missing value by patient, so that once a patient receives a diagnosis, it will follow through to subsequent observations.

我开始使用动物园软件包中的na.locf函数。

I started with this, using the na.locf function from the zoo package.

diagdata <- originaldata[,grep("^patient|^ar|^edatum|^K|^L|^M",colnames(originaldata))]

require(zoo)
require(data.table)

diagnosis <- data.table(diagdata)

diagnosis[,L40:=na.locf(L40),by=patient]

这实现了我正在寻找的东西, L40)。是否有任何方法将上述应用于所有相关的诊断列,即以K,L和M开头的列?

This achieves what I am looking for, but only on the column in question (L40). Is there any way of applying the above to all the relevant diagnosis columns, i.e. columns starting with K, L and M?

推荐答案

cols = grep("^K|^L|^M", names(diagnosis), value = T)

diagnosis[, (cols) := na.locf(.SD, na.rm = F), by = patient, .SDcols = cols]

另请参阅按组中的有效locf单R数据表

这篇关于最后一次观察结果在多个柱上进行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆