使用“apply"处理 data.frames 列表函数族 [英] Processing the list of data.frames with "apply" family of functions

查看:24
本文介绍了使用“apply"处理 data.frames 列表函数族的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,然后我将其拆分为三个(或任意数量)的数据框.

I have a data frame which I then split into three (or any number) of dataframes.

我想要做的是自动处理每个数据框中的每一列,并添加现有变量的滞后版本.

What I’m trying to do is to automatically process each column in each dataframe and add lagged versions of existing variables.

例如,如果每个 data.frame 中有三个变量(V1、V2、V3),我想自动(没有硬编码)添加 V1.lag、V2.lag 和 V3.lag.

For example if there were three variables in each data.frame (V1, V2, V3) I would like to automatically (without hardcoding) add V1.lag, V2.lag and V3.lag.

这是我目前所拥有的,但我现在卡住了.

Here is what I have so far, but I’m stuck now.

任何帮助都将受到高度赞赏.

Any help would be highly apprecaited.

dd<-data.frame(matrix(rnorm(216),72,3),c(rep("A",24),rep("B",24),rep("C",24)),c(rep("J",36),rep("K",36)));
colnames(dd) <- c("v1", "v2", "v3", "dim1", "dim2");
dd;
dds <- split(dd, dd$dim1);
dds;
# Missing step 1: Automatically create v1.lag, v2.lag, v3.lag, etc (if required)

最后,我想将三个数据框合并为一个包含新创建变量的大数据框.

Finally I would like to merge the three data frames into one big dataframe which will include newly created variables.

# Missing step 2: Merge data frames into single data frame

任何帮助将不胜感激.

在评论部分,我询问了移动平均线而不是滞后.这是解决方案:

In comments section I asked about moving averages instead of lags. here is the solution:

ma <- function(x, f=c(1,1,1)){as.numeric(filter(x, f, sides=1)/length(f));}
foo <- function(df, f = c(1,1,1)) {
nums <- sapply(df, is.numeric); ## which are numeric vars
nams <- paste(names(df)[nums], "ma", length(f), sep = "."); ## generate new names    foo.ma
df[, nams] <- lapply(which(nums), function(id, df, f) ma(df[[id]], f = f), df = df, f = f); ## apply ma to each numeric variable
  df; ## return
   }

推荐答案

这是一种选择:

## reuse @Andrie's clag() function as lag() is silly
clag <- function(x, n = 1) c(rep(NA, n), head(x, -n))
## wrapper function to do the addition of lag variables for single DF
foo <- function(df, n = 1) {
  nums <- sapply(df, is.numeric) ## which are numeric vars
  nams <- paste(names(df)[nums], "lag", sep = ".") ## generate new names foo.lag
  df[, nams] <- lapply(which(nums), function(id, df, n) clag(df[[id]], n = n),
                       df = df, n = n) ## apply clag to each numeric variable
  df ## return
}

lapply(dds, foo)

给出:

> lapply(dds, foo)
$A
            v1          v2           v3 dim1 dim2      v1.lag      v2.lag       v3.lag
1  -1.15107343  1.47671548 -0.146501739    A    J          NA          NA           NA
2  -1.61068272 -0.85397093 -1.240187604    A    J -1.15107343  1.47671548 -0.146501739
3  -1.23470282 -0.26194027  1.938344030    A    J -1.61068272 -0.85397093 -1.240187604
4  -0.57874043 -0.44600138  0.326069423    A    J -1.23470282 -0.26194027  1.938344030
5   0.16139066 -1.95804742 -0.744678169    A    J -0.57874043 -0.44600138  0.326069423
6  -1.01497027  0.36850034  1.532640065    A    J  0.16139066 -1.95804742 -0.744678169
7   0.72288058 -0.40115543 -0.686450596    A    J -1.01497027  0.36850034  1.532640065
8  -0.51300447  0.19686310  0.441649595    A    J  0.72288058 -0.40115543 -0.686450596
9   0.95439966 -2.03513002 -0.897784897    A    J -0.51300447  0.19686310  0.441649595
10 -1.36736081 -0.41040962 -0.459403176    A    J  0.95439966 -2.03513002 -0.897784897
11  0.59503846  0.28925760 -0.003095389    A    J -1.36736081 -0.41040962 -0.459403176
12 -0.37951869  0.49551357  0.269412108    A    J  0.59503846  0.28925760 -0.003095389
13 -0.52953401 -0.28433351  1.125505917    A    J -0.37951869  0.49551357  0.269412108
14 -1.73466020  0.25442637 -1.094139749    A    J -0.52953401 -0.28433351  1.125505917
15  0.08479137 -0.11688894 -1.034378216    A    J -1.73466020  0.25442637 -1.094139749
16 -2.45854464  0.15806266 -2.275995527    A    J  0.08479137 -0.11688894 -1.034378216
17  1.10663502  1.28587230  0.070334868    A    J -2.45854464  0.15806266 -2.275995527
18 -0.01945585  1.63659116 -0.137040232    A    J  1.10663502  1.28587230  0.070334868
19  0.59026606 -1.95724134 -0.480014930    A    J -0.01945585  1.63659116 -0.137040232
20 -0.32245933  1.35372005  1.348717525    A    J  0.59026606 -1.95724134 -0.480014930
21 -0.42560327 -1.30145328  2.020609480    A    J -0.32245933  1.35372005  1.348717525
22  1.19550777  0.18417336  0.099232994    A    J -0.42560327 -1.30145328  2.020609480
23  1.20198621  0.05926023 -0.171505810    A    J  1.19550777  0.18417336  0.099232994
24 -1.00667141  1.32441782  0.056696824    A    J  1.20198621  0.05926023 -0.171505810

$B
           v1          v2          v3 dim1 dim2     v1.lag      v2.lag      v3.lag
25  0.7878614  0.10354576 -0.69308980    B    J         NA          NA          NA
26  0.5824551  0.42319616  0.42734938    B    J  0.7878614  0.10354576 -0.69308980
27 -0.2769730  1.51559382 -0.64106570    B    J  0.5824551  0.42319616  0.42734938
28 -0.5736416 -1.58745816 -1.13274631    B    J -0.2769730  1.51559382 -0.64106570
29 -1.9082145 -0.26148604 -0.04699411    B    J -0.5736416 -1.58745816 -1.13274631
30 -1.6254549  0.39390814 -1.79993619    B    J -1.9082145 -0.26148604 -0.04699411
31  0.3963274  1.79667985  0.92873142    B    J -1.6254549  0.39390814 -1.79993619
32 -0.5889415 -0.04690351  1.43394978    B    J  0.3963274  1.79667985  0.92873142
33  0.4683819 -1.34023029  0.18749782    B    J -0.5889415 -0.04690351  1.43394978
34  0.7373052 -0.93470320 -1.14528378    B    J  0.4683819 -1.34023029  0.18749782
35 -0.7751348 -1.26533917  0.11246728    B    J  0.7373052 -0.93470320 -1.14528378
36  1.7786627 -0.19757164  0.14150980    B    J -0.7751348 -1.26533917  0.11246728
37  1.8570412 -2.15174901  1.07751105    B    K  1.7786627 -0.19757164  0.14150980
38  0.5128697  0.40112948 -0.94826274    B    K  1.8570412 -2.15174901  1.07751105
39  0.8710264 -0.59978467  0.54462858    B    K  0.5128697  0.40112948 -0.94826274
40 -0.3711512 -0.15632337  0.15832543    B    K  0.8710264 -0.59978467  0.54462858
41  1.4505624  0.20915835  2.59369653    B    K -0.3711512 -0.15632337  0.15832543
42  0.0871329  0.25440471  0.30096063    B    K  1.4505624  0.20915835  2.59369653
43 -0.7398342 -1.72678544  0.45534941    B    K  0.0871329  0.25440471  0.30096063
44  0.1953264 -0.60560630 -0.36884626    B    K -0.7398342 -1.72678544  0.45534941
45 -0.2702493  0.50747209 -0.50699830    B    K  0.1953264 -0.60560630 -0.36884626
46  0.2987449  0.46347722  1.20725190    B    K -0.2702493  0.50747209 -0.50699830
47 -0.5682779 -0.71470625 -0.07865078    B    K  0.2987449  0.46347722  1.20725190
48 -1.5291983  1.80092050 -1.73317395    B    K -0.5682779 -0.71470625 -0.07865078

$C
            v1           v2           v3 dim1 dim2      v1.lag       v2.lag       v3.lag
49  0.06095825 -0.518263220  0.510999371    C    K          NA           NA           NA
50  0.40077713  0.477989115  0.855752036    C    K  0.06095825 -0.518263220  0.510999371
51  0.06763037  0.802110426 -0.102536186    C    K  0.40077713  0.477989115  0.855752036
52 -0.90530986 -0.005452101 -0.089703589    C    K  0.06763037  0.802110426 -0.102536186
53 -0.79360209  0.299844218 -0.765164525    C    K -0.90530986 -0.005452101 -0.089703589
54  1.34050298 -1.093705314 -0.955952912    C    K -0.79360209  0.299844218 -0.765164525
55  0.45377712  0.054978470  0.382874895    C    K  1.34050298 -1.093705314 -0.955952912
56  0.95283101 -0.564193352  1.458002944    C    K  0.45377712  0.054978470  0.382874895
57  1.09157807 -1.351894599 -1.366084414    C    K  0.95283101 -0.564193352  1.458002944
58  2.71993062 -1.126272793  1.374046159    C    K  1.09157807 -1.351894599 -1.366084414
59 -0.04685281  0.423085481 -0.455903151    C    K  2.71993062 -1.126272793  1.374046159
60 -0.31055449  0.818291875  0.400386018    C    K -0.04685281  0.423085481 -0.455903151
61 -0.54904545  1.542272313  0.648135340    C    K -0.31055449  0.818291875  0.400386018
62 -0.72914142  1.495482707 -0.212135011    C    K -0.54904545  1.542272313  0.648135340
63 -0.27374611 -1.309254707 -0.005125047    C    K -0.72914142  1.495482707 -0.212135011
64  0.87439910 -2.666588138  1.043778597    C    K -0.27374611 -1.309254707 -0.005125047
65  1.07142042  0.446233778 -0.286784683    C    K  0.87439910 -2.666588138  1.043778597
66 -0.10431808  0.510820156  0.405309569    C    K  1.07142042  0.446233778 -0.286784683
67 -1.04006019 -0.041327622  1.202855549    C    K -0.10431808  0.510820156  0.405309569
68  0.41084794 -0.376796559 -1.147032471    C    K -1.04006019 -0.041327622  1.202855549
69  0.88329788 -0.344611311  1.862998306    C    K  0.41084794 -0.376796559 -1.147032471
70 -0.67916248  1.396061431  0.697517685    C    K  0.88329788 -0.344611311  1.862998306
71  3.55359528 -0.207825480 -0.949834845    C    K -0.67916248  1.396061431  0.697517685
72  0.11329113  0.294747300 -0.955891419    C    K  3.55359528 -0.207825480 -0.949834845

最后一点,组合步骤,保存上面的内容:

For the last bit, the combine step, save the above:

dds <- lapply(dds, foo)

然后使用 do.call() 将各个数据帧rbind() 放在一起,如:

then use do.call() to rbind() the individual data frames together, as in:

df2 <- do.call(rbind, dds)

给出:

> head(df2)
            v1         v2         v3 dim1 dim2     v1.lag     v2.lag     v3.lag
A.1 -1.1510734  1.4767155 -0.1465017    A    J         NA         NA         NA
A.2 -1.6106827 -0.8539709 -1.2401876    A    J -1.1510734  1.4767155 -0.1465017
A.3 -1.2347028 -0.2619403  1.9383440    A    J -1.6106827 -0.8539709 -1.2401876
A.4 -0.5787404 -0.4460014  0.3260694    A    J -1.2347028 -0.2619403  1.9383440
A.5  0.1613907 -1.9580474 -0.7446782    A    J -0.5787404 -0.4460014  0.3260694
A.6 -1.0149703  0.3685003  1.5326401    A    J  0.1613907 -1.9580474 -0.7446782

这篇关于使用“apply"处理 data.frames 列表函数族的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆