在带有lag和cummax的mutate中添加na.omit()会导致“错误:列的长度必须为x(组大小)或一个,而不是0". [英] Adding na.omit() in mutate with lag and cummax causes "Error: Column must be length x (the group size) or one, not 0"

查看:45
本文介绍了在带有lag和cummax的mutate中添加na.omit()会导致“错误:列的长度必须为x(组大小)或一个,而不是0".的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用dplyr对数据框中的列进行突变.它包括创建当前行值与到目前为止的最大值的比率(基本上是滞后和cummax组合).效果很好.除非存在NA值,否则以下所有计算都将变为NA.

I'm using dplyr to mutate columns in my dataframe. It consists on creating a ratio of the current row value to the max value so far (basically a lag and cummax combination). It works great. Except when there's a NA value, because all the following calculations become NA.

我尝试在各处放置na.omit(),但可能会起作用,但函数失败,因为na.omit()干扰了向量的长度,并且崩溃了.

I tried placing na.omit() here and there but while it might work, the function fails because na.omit() messes with the length of the vectors and it crashes.

这是我的可复制代码:

v1<-c(NA,100,80,40,NA,30,100,40,20,10,NA,NA,1,NA)
v2<-c(100,100,90,50,NA,-40,NA,-10,NA,NA,NA,1,NA,NA)
group<-c(1,1,1,1,1,1,2,2,2,2,2,3,3,4)

x1<-as.data.frame(cbind(v1,v2,group))


library(dplyr)
for ( i in c("v1","v2")){ 

  x1<-x1 %>% 
    group_by(group) %>%
    mutate( !!sym(paste( i,"_max_lag_ratio", sep="")) :=  get(i)/ lag( as.vector(cummax( get(i)))  , default=first(get(i))))


}

如果我添加na.omit()如下:

If I add na.omit() as follows:

mutate( !!sym(paste( i,"_max_lag_ratio", sep="")) := get(i)/ lag( cummax( na.omit(get(i)))  , default=first( get(i)  )))

我收到以下错误:

Error: Column `column_max_lag_ratio` must be length 1 (the group size), not 0

最可能是因为一个单独的组(第4组)仅具有NA.我该如何使它失效保护?我的真实数据集具有不完美"的数据.非常感谢您的帮助,因为我真的很坚持.

Most likely because of one single group (group 4) having only NAs. How can I make this failsafe? My real dataset features "imperfect" data. Help is greatly appreciated since I'm really stucked.

推荐答案

做出了这种解决方法,并完成了窍门.

Made this workaround and did the trick.

v1<-c(NA,100,80,40,NA,30,100,40,20,10,NA,NA,1,NA)
v2<-c(100,100,90,50,NA,-40,NA,-10,NA,NA,NA,1,NA,NA)
group<-c(1,1,1,1,1,1,2,2,2,2,2,3,3,4)

x1<-as.data.frame(cbind(v1,v2,group))


library(dplyr)
for ( i in c("v1","v2")){ 

  x1<-x1 %>% 
    group_by(group) %>%
    mutate( !!sym(paste( i,"_max_lag_ratio", sep="")) :=  get(i)/(lag( cummax( ifelse(is.na(get(i)), na.omit(get(i) ) ,get(i)))  , default=first(get(i))))
    )  

}

这篇关于在带有lag和cummax的mutate中添加na.omit()会导致“错误:列的长度必须为x(组大小)或一个,而不是0".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆