使用过滤器在不同变量的数据框架中 [英] Lapply in a dataframe over different variables using filters
问题描述
说我有:
数据集< - data.frame(time = rep(c(1990:1992),2),
geo = c(rep(AT,3),rep(DE,3)),var1 = c(1:6),var2 = c(7:12))
时间geo var1 var2
1 1990 AT 1 7
2 1991 AT 2 8
3 1992年3月3日
4 1990 DE 4 10
5 1991 DE 5 11
6 1992 DE 6 12
我想要:
time geo var1 var2 var1_1990 var1_1991 var2_1990 var2_1991
1 1990 AT 1 7 1 2 7 8
2 1991 AT 2 8 1 2 7 8
3 1992 AT 3 9 1 2 7 8
4 1990 DE 4 10 4 5 10 11
5 1991 DE 5 11 4 5 10 11
6 1992 DE 6 12 4 5 10 11
所以时间和变量都为新的变量而变化。这是我的尝试:
intitialyears< - c(1990,1991)
intitialvars< - c( var1,var2)
#理想情况下,我想要代码,我只需要更改这两个向量
#,并且可以更改其维数
for(i在初始阶段){
lapply(initialvars,function(x){
rep(Dataset [time == i,x],each = length(unique(Dataset $ time)))
} )}
哪些运行没有错误,但没有产生任何东西。我想在示例中分配变量名称(例如var1_1990),并立即使新变量成为数据帧的一部分。我也想避免for循环,但是我不知道如何围绕这个功能来包装两个lapply。我应该使用这个函数使用两个参数吗?应用功能是否不会将结果携带到我的环境中的问题?我已经被困在这里了一段时间,所以我会感谢任何帮助!
ps:我有解决方案,通过组合进行这种组合,无需申请和喜欢但是我试图摆脱复制和粘贴:
数据集$ var1_1990< - c(rep(Dataset $ var1 [数据集$ time == 1990]],
each = length(unique(Dataset $ time)))
这可以用 subset()
, reshape()
和 merge()
:
merge(Dataset,reshape(subset(Dataset,time%in%c(1990,1991))) dir ='w',idvar ='geo',sep ='_'));
## geo time var1 var2 var1_1990 var2_1990 var1_1991 var2_1991
## 1 AT 1990 1 7 1 7 2 8
## 2 AT 1991 2 8 1 7 2 8
## 3 AT 1992 3 9 1 7 2 8
## 4 DE 1990 4 10 4 10 5 11
## 5 DE 1991 5 11 4 10 5 11
## 6 DE 1992 6 12 4 10 5 11
列顺序不完全符合您的问题,但您可以如果需要,可以使用索引操作来修复该事件。
I'm trying to calculate several new variables in my dataframe. Take initial values for example:
Say I have:
Dataset <- data.frame(time=rep(c(1990:1992),2),
geo=c(rep("AT",3),rep("DE",3)),var1=c(1:6), var2=c(7:12))
time geo var1 var2
1 1990 AT 1 7
2 1991 AT 2 8
3 1992 AT 3 9
4 1990 DE 4 10
5 1991 DE 5 11
6 1992 DE 6 12
And I want:
time geo var1 var2 var1_1990 var1_1991 var2_1990 var2_1991
1 1990 AT 1 7 1 2 7 8
2 1991 AT 2 8 1 2 7 8
3 1992 AT 3 9 1 2 7 8
4 1990 DE 4 10 4 5 10 11
5 1991 DE 5 11 4 5 10 11
6 1992 DE 6 12 4 5 10 11
So both time and the variable are changing for the new variables. Here is my attempt:
intitialyears <- c(1990,1991)
intitialvars <- c("var1", "var2")
# ideally, I want code where I only have to change these two vectors
# and where it's possible to change their dimensions
for (i in initialyears){
lapply(initialvars,function(x){
rep(Dataset[time==i,x],each=length(unique(Dataset$time)))
})}
Which runs without error but yields nothing. I would like to assign the variable names in the example (eg. "var1_1990") and immediately make the new variables part of the dataframe. I would also like to avoid the for loop but I don't know how to wrap two lapply's around this function. Should I rather have the function use two arguments? Is the problem that the apply function does not carry the results into my environment? I've been stuck here for a while so I would be grateful for any help!
p.s.: I have the solution to do this combination by combination without apply and the likes but I'm trying to get away from copy and paste:
Dataset$var1_1990 <- c(rep(Dataset$var1[which(Dataset$time==1990)],
each=length(unique(Dataset$time))))
This can be done with subset()
, reshape()
, and merge()
:
merge(Dataset,reshape(subset(Dataset,time%in%c(1990,1991)),dir='w',idvar='geo',sep='_'));
## geo time var1 var2 var1_1990 var2_1990 var1_1991 var2_1991
## 1 AT 1990 1 7 1 7 2 8
## 2 AT 1991 2 8 1 7 2 8
## 3 AT 1992 3 9 1 7 2 8
## 4 DE 1990 4 10 4 10 5 11
## 5 DE 1991 5 11 4 10 5 11
## 6 DE 1992 6 12 4 10 5 11
The column order isn't exactly what you have in your question, but you can fix that up after-the-fact with an index operation, if necessary.
这篇关于使用过滤器在不同变量的数据框架中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!