在带有lag和cummax的mutate中添加na.omit()会导致“错误:列的长度必须为x(组大小)或一个,而不是0". [英] Adding na.omit() in mutate with lag and cummax causes "Error: Column must be length x (the group size) or one, not 0"
问题描述
我正在使用dplyr对数据框中的列进行突变.它包括创建当前行值与到目前为止的最大值的比率(基本上是滞后和cummax组合).效果很好.除非存在NA值,否则以下所有计算都将变为NA.
I'm using dplyr to mutate columns in my dataframe. It consists on creating a ratio of the current row value to the max value so far (basically a lag and cummax combination). It works great. Except when there's a NA value, because all the following calculations become NA.
我尝试在各处放置na.omit(),但可能会起作用,但函数失败,因为na.omit()干扰了向量的长度,并且崩溃了.
I tried placing na.omit() here and there but while it might work, the function fails because na.omit() messes with the length of the vectors and it crashes.
这是我的可复制代码:
v1<-c(NA,100,80,40,NA,30,100,40,20,10,NA,NA,1,NA)
v2<-c(100,100,90,50,NA,-40,NA,-10,NA,NA,NA,1,NA,NA)
group<-c(1,1,1,1,1,1,2,2,2,2,2,3,3,4)
x1<-as.data.frame(cbind(v1,v2,group))
library(dplyr)
for ( i in c("v1","v2")){
x1<-x1 %>%
group_by(group) %>%
mutate( !!sym(paste( i,"_max_lag_ratio", sep="")) := get(i)/ lag( as.vector(cummax( get(i))) , default=first(get(i))))
}
如果我添加na.omit()如下:
If I add na.omit() as follows:
mutate( !!sym(paste( i,"_max_lag_ratio", sep="")) := get(i)/ lag( cummax( na.omit(get(i))) , default=first( get(i) )))
我收到以下错误:
Error: Column `column_max_lag_ratio` must be length 1 (the group size), not 0
最可能是因为一个单独的组(第4组)仅具有NA.我该如何使它失效保护?我的真实数据集具有不完美"的数据.非常感谢您的帮助,因为我真的很坚持.
Most likely because of one single group (group 4) having only NAs. How can I make this failsafe? My real dataset features "imperfect" data. Help is greatly appreciated since I'm really stucked.
推荐答案
做出了这种解决方法,并完成了窍门.
Made this workaround and did the trick.
v1<-c(NA,100,80,40,NA,30,100,40,20,10,NA,NA,1,NA)
v2<-c(100,100,90,50,NA,-40,NA,-10,NA,NA,NA,1,NA,NA)
group<-c(1,1,1,1,1,1,2,2,2,2,2,3,3,4)
x1<-as.data.frame(cbind(v1,v2,group))
library(dplyr)
for ( i in c("v1","v2")){
x1<-x1 %>%
group_by(group) %>%
mutate( !!sym(paste( i,"_max_lag_ratio", sep="")) := get(i)/(lag( cummax( ifelse(is.na(get(i)), na.omit(get(i) ) ,get(i))) , default=first(get(i))))
)
}
这篇关于在带有lag和cummax的mutate中添加na.omit()会导致“错误:列的长度必须为x(组大小)或一个,而不是0".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!