使用过滤器在不同变量的数据框架中 [英] Lapply in a dataframe over different variables using filters

查看:168
本文介绍了使用过滤器在不同变量的数据框架中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在数据框中计算几个新的变量。以初始值为例:



说我有:

 数据集<  -  data.frame(time = rep(c(1990:1992),2),
geo = c(rep(AT,3),rep(DE,3)),var1 = c(1:6),var2 = c(7:12))

时间geo var1 var2
1 1990 AT 1 7
2 1991 AT 2 8
3 1992年3月3日
4 1990 DE 4 10
5 1991 DE 5 11
6 1992 DE 6 12

我想要:

  time geo var1 var2 var1_1990 var1_1991 var2_1990 var2_1991 
1 1990 AT 1 7 1 2 7 8
2 1991 AT 2 8 1 2 7 8
3 1992 AT 3 9 1 2 7 8
4 1990 DE 4 10 4 5 10 11
5 1991 DE 5 11 4 5 10 11
6 1992 DE 6 12 4 5 10 11

所以时间和变量都为新的变量而变化。这是我的尝试:

  intitialyears<  -  c(1990,1991)
intitialvars< - c( var1,var2)
#理想情况下,我想要代码,我只需要更改这两个向量
#,并且可以更改其维数

for(i在初始阶段){
lapply(initialvars,function(x){
rep(Dataset [time == i,x],each = length(unique(Dataset $ time)))
} )}

哪些运行没有错误,但没有产生任何东西。我想在示例中分配变量名称(例如var1_1990),并立即使新变量成为数据帧的一部分。我也想避免for循环,但是我不知道如何围绕这个功能来包装两个lapply。我应该使用这个函数使用两个参数吗?应用功能是否不会将结果携带到我的环境中的问题?我已经被困在这里了一段时间,所以我会感谢任何帮助!



ps:我有解决方案,通过组合进行这种组合,无需申请和喜欢但是我试图摆脱复制和粘贴:

 数据集$ var1_1990<  -  c(rep(Dataset $ var1 [数据集$ time == 1990]],
each = length(unique(Dataset $ time)))


解决方案

这可以用 subset() reshape() merge()

  merge(Dataset,reshape(subset(Dataset,time%in%c(1990,1991))) dir ='w',idvar ='geo',sep ='_')); 
## geo time var1 var2 var1_1990 var2_1990 var1_1991 var2_1991
## 1 AT 1990 1 7 1 7 2 8
## 2 AT 1991 2 8 1 7 2 8
## 3 AT 1992 3 9 1 7 2 8
## 4 DE 1990 4 10 4 10 5 11
## 5 DE 1991 5 11 4 10 5 11
## 6 DE 1992 6 12 4 10 5 11

列顺序不完全符合您的问题,但您可以如果需要,可以使用索引操作来修复该事件。


I'm trying to calculate several new variables in my dataframe. Take initial values for example:

Say I have:

Dataset <- data.frame(time=rep(c(1990:1992),2),
           geo=c(rep("AT",3),rep("DE",3)),var1=c(1:6), var2=c(7:12))

        time    geo var1 var2
1       1990    AT  1    7
2       1991    AT  2    8
3       1992    AT  3    9
4       1990    DE  4   10
5       1991    DE  5   11
6       1992    DE  6   12

And I want:

        time    geo  var1  var2  var1_1990  var1_1991  var2_1990 var2_1991
1       1990    AT   1     7      1          2          7         8
2       1991    AT   2     8      1          2          7         8
3       1992    AT   3     9      1          2          7         8
4       1990    DE   4     10     4          5          10        11
5       1991    DE   5     11     4          5          10        11
6       1992    DE   6     12     4          5          10        11

So both time and the variable are changing for the new variables. Here is my attempt:

intitialyears <- c(1990,1991)
intitialvars <- c("var1", "var2") 
# ideally, I want code where I only have to change these two vectors 
# and where it's possible to change their dimensions

for (i in initialyears){
lapply(initialvars,function(x){
rep(Dataset[time==i,x],each=length(unique(Dataset$time)))
})}

Which runs without error but yields nothing. I would like to assign the variable names in the example (eg. "var1_1990") and immediately make the new variables part of the dataframe. I would also like to avoid the for loop but I don't know how to wrap two lapply's around this function. Should I rather have the function use two arguments? Is the problem that the apply function does not carry the results into my environment? I've been stuck here for a while so I would be grateful for any help!

p.s.: I have the solution to do this combination by combination without apply and the likes but I'm trying to get away from copy and paste:

Dataset$var1_1990 <- c(rep(Dataset$var1[which(Dataset$time==1990)],
                      each=length(unique(Dataset$time))))

解决方案

This can be done with subset(), reshape(), and merge():

merge(Dataset,reshape(subset(Dataset,time%in%c(1990,1991)),dir='w',idvar='geo',sep='_'));
##   geo time var1 var2 var1_1990 var2_1990 var1_1991 var2_1991
## 1  AT 1990    1    7         1         7         2         8
## 2  AT 1991    2    8         1         7         2         8
## 3  AT 1992    3    9         1         7         2         8
## 4  DE 1990    4   10         4        10         5        11
## 5  DE 1991    5   11         4        10         5        11
## 6  DE 1992    6   12         4        10         5        11

The column order isn't exactly what you have in your question, but you can fix that up after-the-fact with an index operation, if necessary.

这篇关于使用过滤器在不同变量的数据框架中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆