R将间歇性NA值替换为上次结转的结转值（NA.LOCF） [英] R Replace Intermittent NA Values With Last Observation Carried Forward (NA.LOCF)

查看：280 发布时间：2020/10/26 4:25:08 r dplyr zoo

本文介绍了R将间歇性NA值替换为上次结转的结转值（NA.LOCF）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

背景

我需要根据NA的性质使用不同的方法来替换数据框中的NA。我的数据框架来自对重复测量的研究，其中一些Na是受试者遗失的结果，而其他Na是间歇性缺失测量（定义为一个或多个缺失测量的序列，然后是测量值）的结果。
我将间歇性缺失测量称为间歇性NA。

问题

我无法测试NA是否是间歇性丢失测量结果的结果，以及应使用哪些功能替换这些NA。理想情况下，我将这些间歇性NA替换为na.locf方法。但是我需要用基准线或观察到的最后一个值（以较大者为准）替换Dropout NA's。

示例

示例1

这是我想要的NA的清晰示例通过na.locf插补被视为间歇性NA：

  data.frame（visit = c（1,2,3 ，4,5,6,7,8,9,10），value = c（34，NA，NA，15,16,19，NA，12,23,31））

以及我希望最终结果如何：

  data.frame（visit = c（1,2,3,4,5,6,7,8,9,10），value = c（34,34,34,15,16,19， 19,12,23,31））

示例2

以下是我想由先前的非NA观测值或基准值（请访问1）估算的NA（缺失NA）的清晰示例），以最大者为准：

  data.frame（visit = c（1,2,3,4,5,6， 7,8,9,10），value = c（34,22,18,15,16,19，NA，NA，NA，NA））
   
 
 以及我希望最终结果如何：
  data.frame（visit = c（1,2,3,4,5,6,7,8,9,10），值= c（34,22,18,15,16,19,34,34,34,34））
  
  示例3   
 
 
 这是需要不同归因的NA的混合的一个复杂示例，此处先前的非NA观察值大于掉落NA的基线观察值（访问1）：
  data.frame（visit = c（1,2,3,4,5,6,7,8,9,10），值= c（34，NA，NA，42,16,19，NA，38，NA，NA））
  
我希望结果如何：
  data.frame（visit = c（1,2,3， 4,5,6,7,8,9,10），value = c（34,34,34,42,16,19,19,38,38,38））
  
  示例4   
 
 
 另一个复杂的示例，其中基线观察值（访问1）大于先前的非NA值，用于丢弃NA：
  data .frame（visit = c（1,2,3,4,5,6,7 ，8,9,10），value = c（40，NA，NA，42,16,19，NA，38，NA，NA））
  
我需要结果如何：
  data.frame（visit = c（1,2,3,4,5,6,7,8,9,10），值= c（40,40,40,42,16,19,19,38,40,40））
  
 
 
 
 
 
  我尝试过的事情 
 
 
 如@Gregor所建议，在我说这可以解决我的问题后，可以用以下方法测试间歇性NA的存在：
  mutate（is.na（value）& ！is.na（lead（value））
  
但这不能帮助我估算所有间歇性NA尤其是顺序（NA1，NA2，NA3,14）中的间歇性NA，在运行此测试后仅将NA3返回为TRUE。
方案
我们可以使用 na.locf（...，fromLast = TRUE）来识别尾随的 NA 值，并在基线上使用 pmax 。我们将以一个很好的整体格式展示您问题中的示例：
 ＃合并示例数据
 dd = data.frame（
示例= rep（1：3，每个= 10），
访问= rep（1:10，3），
 value = c（34，NA， NA，15,16,19，NA，12,23,31，
 34,22,18,15,16,19，NA，NA，NA，NA，
 34，NA，NA， 42,16,19，NA，38，NA，NA），
目标= c（34,34,34,15,16,19,19,12,23,31，
 34,22 ，18,15,16,19,34,34,34,34，
 34,34,34,42,16,19,19,38,38,38）
）
 
库（dplyr）
 dd = dd％>％group_by（示例）％&%% b $ b mutate（to_fill =！is.na（zoo :: na.locf（value，fromLast = TRUE，na.rm = FALSE）），
结果= if_else（to_fill，
 zoo :: na.locf（value，na.rm = FALSE），
 pmax（first（value ），zoo :: na.locf（value，na.rm = FALSE））），
）
 
 all（dd $ goal == dd $ result）
＃[ 1] TRUE 
  
如您所见， resul t 与 goal 列完全匹配。
 
Background

I neeed to replace the NA's in my data frame by using different methods depending on the NA's nature. My data frame come from a study with repeated measures, where some of the Na's are a result of subjects dropping out while others are a result of intermittent missing measurements, defined as one or a sequence of multiple missing measurements, followed by a measured value.
I will be referring to intermittent missing measurements as intermittent NA's.

Problem

I am having trouble testing whether the NA's are the result of intermittent missing measurements, and what functions I should use to replace these NA's with. I would ideally replace these intermittent NA's with the na.locf method. But I need Dropout NA's to be replaced with the baseline OR the last value observed, whichever is greater.

Examples

Example 1

Here is a clean example of NA's that I want to be treated as intermittent NA's with the na.locf imputation:
data.frame(visit=c(1,2,3,4,5,6,7,8,9,10),value=c(34,NA,NA,15,16,19,NA,12,23,31))
and how I want it the end result to be:
data.frame(visit=c(1,2,3,4,5,6,7,8,9,10),value=c(34,34,34,15,16,19,19,12,23,31))
Example 2

Here is a clean example of NA's (dropout NA's) that I want to be imputed by the previous non-NA observation OR the baseline value (visit 1), whichever is greatest:
data.frame(visit=c(1,2,3,4,5,6,7,8,9,10),value=c(34,22,18,15,16,19,NA,NA,NA,NA))
And how I want the end result to be:
data.frame(visit=c(1,2,3,4,5,6,7,8,9,10),value=c(34,22,18,15,16,19,34,34,34,34))
Example 3

Here is a complex example of a mixture of NA's which need different imputations, here where the previous non-NA observation is greater than the baseline observation (visit 1) for the dropout NA's:
data.frame(visit=c(1,2,3,4,5,6,7,8,9,10),value=c(34,NA,NA,42,16,19,NA,38,NA,NA))
How I need the result to be:
data.frame(visit=c(1,2,3,4,5,6,7,8,9,10),value=c(34,34,34,42,16,19,19,38,38,38))
Example 4

Another complex example where the baseline observation (visit 1) is greater than the previous non-NA value for the dropout NA's:
data.frame(visit=c(1,2,3,4,5,6,7,8,9,10),value=c(40,NA,NA,42,16,19,NA,38,NA,NA))
How I need the result to be:
data.frame(visit=c(1,2,3,4,5,6,7,8,9,10),value=c(40,40,40,42,16,19,19,38,40,40))




What I have tried

As suggested by @Gregor, upon me stating that this would solve my problems, it was possible to test for the presence of intermittent NA's with:
mutate(is.na(value) & !is.na(lead(value))
But this does not help me with imputing all intermittent NA's and in particular, intermittent NA's that are in a sequence (NA1,NA2,NA3,14), where only NA3 is returned as TRUE after running this test.
 解决方案 
We can use na.locf(..., fromLast = TRUE) to identify the trailing NA values and use pmax on them with the baseline. We'll demonstrate on the examples from your question in a nice all-together format:
# consolidate example data
dd = data.frame(
  example = rep(1:3, each = 10),
  visit = rep(1:10, 3),
  value = c(34,NA,NA,15,16,19,NA,12,23,31,
            34,22,18,15,16,19,NA,NA,NA,NA,
            34,NA,NA,42,16,19,NA,38,NA,NA),
  goal = c(34,34,34,15,16,19,19,12,23,31,
           34,22,18,15,16,19,34,34,34,34,
           34,34,34,42,16,19,19,38,38,38)
)

library(dplyr)
dd = dd %>% group_by(example) %>%
  mutate(to_fill = !is.na(zoo::na.locf(value, fromLast = TRUE, na.rm = FALSE)),
         result = if_else(to_fill,
                          zoo::na.locf(value, na.rm = FALSE),
                          pmax(first(value), zoo::na.locf(value, na.rm = FALSE))),
    )

all(dd$goal == dd$result)
# [1] TRUE
As you can see, the result matches the goal column perfectly.

                        这篇关于R将间歇性NA值替换为上次结转的结转值（NA.LOCF）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R将间歇性NA值替换为上次结转的结转值（NA.LOCF） [英] R Replace Intermittent NA Values With Last Observation Carried Forward (NA.LOCF)

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R将间歇性NA值替换为上次结转的结转值（NA.LOCF） [英] R Replace Intermittent NA Values With Last Observation Carried Forward (NA.LOCF)

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭