使用过滤器在不同变量的数据框架中 [英] Lapply in a dataframe over different variables using filters

查看：168 发布时间：2017/3/26 2:39:15 r dataframe lapply

本文介绍了使用过滤器在不同变量的数据框架中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在数据框中计算几个新的变量。以初始值为例：

说我有：

 数据集<  -  data.frame（time = rep（c（1990：1992），2），
 geo = c（rep（AT，3），rep（DE，3）），var1 = c（1：6），var2 = c（7:12））
 
时间geo var1 var2 
 1 1990 AT 1 7 
 2 1991 AT 2 8 
 3 1992年3月3日
 4 1990 DE 4 10 
 5 1991 DE 5 11 
 6 1992 DE 6 12

我想要：

  time geo var1 var2 var1_1990 var1_1991 var2_1990 var2_1991 
 1 1990 AT 1 7 1 2 7 8 
 2 1991 AT 2 8 1 2 7 8 
 3 1992 AT 3 9 1 2 7 8 
 4 1990 DE 4 10 4 5 10 11 
 5 1991 DE 5 11 4 5 10 11 
 6 1992 DE 6 12 4 5 10 11

所以时间和变量都为新的变量而变化。这是我的尝试：

  intitialyears<  -  c（1990,1991）
 intitialvars<  -  c（ var1，var2）
＃理想情况下，我想要代码，我只需要更改这两个向量
＃，并且可以更改其维数
 
 for（i在初始阶段）{
 lapply（initialvars，function（x）{
 rep（Dataset [time == i，x]，each = length（unique（Dataset $ time）））
} ）}

哪些运行没有错误，但没有产生任何东西。我想在示例中分配变量名称（例如var1_1990），并立即使新变量成为数据帧的一部分。我也想避免for循环，但是我不知道如何围绕这个功能来包装两个lapply。我应该使用这个函数使用两个参数吗？应用功能是否不会将结果携带到我的环境中的问题？我已经被困在这里了一段时间，所以我会感谢任何帮助！

ps：我有解决方案，通过组合进行这种组合，无需申请和喜欢但是我试图摆脱复制和粘贴：

 数据集$ var1_1990<  -  c（rep（Dataset $ var1 [数据集$ time == 1990]]，
 each = length（unique（Dataset $ time）））

解决方案

这可以用 subset（） ， reshape（） 和 merge（） ：

  merge（Dataset，reshape（subset（Dataset，time％in％c（1990,1991））） dir ='w'，idvar ='geo'，sep ='_'））; 
 ## geo time var1 var2 var1_1990 var2_1990 var1_1991 var2_1991 
 ## 1 AT 1990 1 7 1 7 2 8 
 ## 2 AT 1991 2 8 1 7 2 8 
 ## 3 AT 1992 3 9 1 7 2 8 
 ## 4 DE 1990 4 10 4 10 5 11 
 ## 5 DE 1991 5 11 4 10 5 11 
 ## 6 DE 1992 6 12 4 10 5 11

列顺序不完全符合您的问题，但您可以如果需要，可以使用索引操作来修复该事件。

I'm trying to calculate several new variables in my dataframe. Take initial values for example:

Say I have:

Dataset <- data.frame(time=rep(c(1990:1992),2),
           geo=c(rep("AT",3),rep("DE",3)),var1=c(1:6), var2=c(7:12))

        time    geo var1 var2
1       1990    AT  1    7
2       1991    AT  2    8
3       1992    AT  3    9
4       1990    DE  4   10
5       1991    DE  5   11
6       1992    DE  6   12

And I want:

        time    geo  var1  var2  var1_1990  var1_1991  var2_1990 var2_1991
1       1990    AT   1     7      1          2          7         8
2       1991    AT   2     8      1          2          7         8
3       1992    AT   3     9      1          2          7         8
4       1990    DE   4     10     4          5          10        11
5       1991    DE   5     11     4          5          10        11
6       1992    DE   6     12     4          5          10        11

So both time and the variable are changing for the new variables. Here is my attempt:

intitialyears <- c(1990,1991)
intitialvars <- c("var1", "var2") 
# ideally, I want code where I only have to change these two vectors 
# and where it's possible to change their dimensions

for (i in initialyears){
lapply(initialvars,function(x){
rep(Dataset[time==i,x],each=length(unique(Dataset$time)))
})}

Which runs without error but yields nothing. I would like to assign the variable names in the example (eg. "var1_1990") and immediately make the new variables part of the dataframe. I would also like to avoid the for loop but I don't know how to wrap two lapply's around this function. Should I rather have the function use two arguments? Is the problem that the apply function does not carry the results into my environment? I've been stuck here for a while so I would be grateful for any help!

p.s.: I have the solution to do this combination by combination without apply and the likes but I'm trying to get away from copy and paste:

Dataset$var1_1990 <- c(rep(Dataset$var1[which(Dataset$time==1990)],
                      each=length(unique(Dataset$time))))

解决方案

This can be done with subset(), reshape(), and merge():

merge(Dataset,reshape(subset(Dataset,time%in%c(1990,1991)),dir='w',idvar='geo',sep='_'));
##   geo time var1 var2 var1_1990 var2_1990 var1_1991 var2_1991
## 1  AT 1990    1    7         1         7         2         8
## 2  AT 1991    2    8         1         7         2         8
## 3  AT 1992    3    9         1         7         2         8
## 4  DE 1990    4   10         4        10         5        11
## 5  DE 1991    5   11         4        10         5        11
## 6  DE 1992    6   12         4        10         5        11

The column order isn't exactly what you have in your question, but you can fix that up after-the-fact with an index operation, if necessary.

这篇关于使用过滤器在不同变量的数据框架中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用过滤器在不同变量的数据框架中 [英] Lapply in a dataframe over different variables using filters

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用过滤器在不同变量的数据框架中 [英] Lapply in a dataframe over different variables using filters

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭