如何有效地将数据帧分成几个大块,以传递到列表列表 [英] how to efficiently subset a dataframe into several chunks to be passed to a list of lists

查看:89
本文介绍了如何有效地将数据帧分成几个大块,以传递到列表列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于将数据帧有效地分成几个大块以基于变量imput传递到列表列表的任何帮助,我将不胜感激.

I would appreciate any help to efficiently subset a data frame into several chunks to be passed to a list of lists based on the variable imput.

下面的代码适用于一些子集,但是我要创建100个子集,并且代码变得太多且难以处理.因此,我需要一种更高效的方法,无需太多代码即可完成相同的结果.

My code below works for a few subsets, but I have 100 subsets to create and the code becomes too much and difficult to handle. Therefore, I need a more efficient approach which accomplishes the same outcome without too much code.

imputation_groups <- split(dat, dat$imput)此处中讨论的方法使我可以拆分数据到基于imput的几个块(数据帧)的列表中,但我希望能够随后从每个块中提取变量以从每个块中创建一个列表,然后再从这些列表中创建一个列表.另外,我不确定如何为从每个块创建的每个列表创建变量N <- nrow(dT_P1), N <- nrow(dT_P2), N <- nrow(dT_P3), N <- nrow(dT_P4), N <- nrow(dT_P5).

The approach imputation_groups <- split(dat, dat$imput) discussed here allows me to split my data into a list of several chunks (data frames) based on imput but I want to be able to subsequently extract variables from each of the chunks to create a list from each chunk and then a list of these lists. Additionally, I am not certain how to create the variable N <- nrow(dT_P1), N <- nrow(dT_P2), N <- nrow(dT_P3), N <- nrow(dT_P4), N <- nrow(dT_P5) for each of the lists created from each of the chunks.

dat <- structure(list(id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 
3, 3, 4, 4, 4, 4, 4), imput = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 
1, 2, 3, 4, 5, 1, 2, 3, 4, 5), A = c(1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), B = c(1, 1, 1, 1, 1, 0, 
0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0), Pass = c(278, 278, 
278, 278, 278, 100, 100, 100, 100, 100, 153, 153, 153, 153, 153, 
79, 79, 79, 79, 79), Fail = c(740, 743, 742, 743, 740, 7581, 
7581, 7581, 7581, 7581, 1231, 1232, 1235, 1235, 1232, 1731, 1732, 
1731, 1731, 1731), Weights_1 = c(4, 3, 4, 3, 3, 1, 2, 1, 2, 1, 
12, 12, 11, 12, 12, 3, 5, 3, 3, 3), Weights_2 = c(3, 3, 3, 3, 
3, 1, 1, 1, 1, 1, 12, 12, 12, 12, 12, 3, 3, 3, 3, 3), Weights_3 = c(4, 
3, 3, 3, 3, 1, 2, 1, 1, 1, 12, 12, 11, 12, 12, 3, 3, 3, 3, 3), 
    Weights_4 = c(3, 3, 4, 3, 3, 1, 1, 1, 2, 1, 12, 12, 13, 12, 
    12, 3, 2, 3, 3, 3), Weights_5 = c(3, 3, 3, 3, 3, 1, 0, 1, 
    1, 1, 12, 12, 12, 12, 12, 3, 3, 3, 3, 3), Weights_6 = c(4, 
    3, 3, 3, 3, 1, 1, 1, 1, 1, 12, 12, 12, 12, 12, 3, 3, 3, 3, 
    3), Weights_7 = c(3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 12, 12, 12, 
    12, 12, 3, 3, 3, 3, 3), Weights_8 = c(3, 3, 3, 3, 3, 1, 1, 
    1, 1, 1, 15, 12, 12, 12, 12, 3, 3, 3, 3, 3), Weights_9 = c(3, 
    3, 3, 4, 3, 1, 1, 1, 1, 1, 12, 12, 12, 12, 12, 2, 3, 3, 3, 
    3), Weights_10 = c(3, 3, 4, 3, 3, 1, 1, 1, 1, 1, 12, 10, 
    12, 12, 12, 3, 3, 3, 3, 3)), class = "data.frame", row.names = c(NA, 
-20L))

我的方法:

##subsetting based on `imput`

    ##imput = `1`
    dT_P1<- dat[dat$imput == '1',]

    N <- nrow(dT_P1)
    C <-ncol(dT_P1)
    ncases <- dT_P1$Pass
    nn <- dT_P1$Fail + dT_P1$Pass
    A <- dT_P1$A
    B <- dT_P1$B
    id <- dT_P1$id
    imput <- dT_P1$imput
    w_1 <- dT_P1$Weights_1
    w_2 <- dT_P1$Weights_2
    w_3 <- dT_P1$Weights_3
    w_4 <- dT_P1$Weights_4
    w_5 <- dT_P1$Weights_5
    w_6 <- dT_P1$Weights_6
    w_7 <- dT_P1$Weights_7
    w_8 <- dT_P1$Weights_8
    w_9 <- dT_P1$Weights_9
    w_10 <- dT_P1$Weights_10

    dat1 <- list (N = N, 
              ncases = ncases, A = A, B = B, id = id, P = imput, nn = nn,
              weights = cbind(w_1, w_2, w_3, w_4, w_5, w_6, w_7, w_8, w_9, w_10))


    ##imput = `2`
    dT_P2<- dat[dat$imput == '2',]

    N <- nrow(dT_P2)
    C <-ncol(dT_P2)
    ncases <- dT_P2$Pass
    nn <- dT_P2$Fail + dT_P2$Pass
    A <- dT_P2$A
    B <- dT_P2$B
    id <- dT_P2$id
    imput <- dT_P2$imput
    w_1 <- dT_P2$Weights_1
    w_2 <- dT_P2$Weights_2
    w_3 <- dT_P2$Weights_3
    w_4 <- dT_P2$Weights_4
    w_5 <- dT_P2$Weights_5
    w_6 <- dT_P2$Weights_6
    w_7 <- dT_P2$Weights_7
    w_8 <- dT_P2$Weights_8
    w_9 <- dT_P2$Weights_9
    w_10 <- dT_P2$Weights_10

    dat2 <- list (N = N, 
              ncases = ncases, A = A, B = B, id = id, P = imput, nn = nn,
              weights = cbind(w_1, w_2, w_3, w_4, w_5, w_6, w_7, w_8, w_9, w_10))

    ##imput = `3`
    dT_P3<- dat[dat$imput == '3',]

    N <- nrow(dT_P3)
    C <-ncol(dT_P3)
    ncases <- dT_P3$Pass
    nn <- dT_P3$Fail + dT_P3$Pass
    A <- dT_P3$A
    B <- dT_P3$B
    id <- dT_P3$id
    imput <- dT_P3$imput
    w_1 <- dT_P3$Weights_1
    w_2 <- dT_P3$Weights_2
    w_3 <- dT_P3$Weights_3
    w_4 <- dT_P3$Weights_4
    w_5 <- dT_P3$Weights_5
    w_6 <- dT_P3$Weights_6
    w_7 <- dT_P3$Weights_7
    w_8 <- dT_P3$Weights_8
    w_9 <- dT_P3$Weights_9
    w_10 <- dT_P3$Weights_10

    dat3 <- list (N = N, 
              ncases = ncases, A = A, B = B, id = id, P = imput, nn = nn,
              weights = cbind(w_1, w_2, w_3, w_4, w_5, w_6, w_7, w_8, w_9, w_10))

    ##imput = `4`
    dT_P4<- dat[dat$imput == '4',]

    N <- nrow(dT_P4)
    C <-ncol(dT_P4)
    ncases <- dT_P4$Pass
    nn <- dT_P4$Fail + dT_P4$Pass
    A <- dT_P4$A
    B <- dT_P4$B
    id <- dT_P4$id
    imput <- dT_P4$imput
    w_1 <- dT_P4$Weights_1
    w_2 <- dT_P4$Weights_2
    w_3 <- dT_P4$Weights_3
    w_4 <- dT_P4$Weights_4
    w_5 <- dT_P4$Weights_5
    w_6 <- dT_P4$Weights_6
    w_7 <- dT_P4$Weights_7
    w_8 <- dT_P4$Weights_8
    w_9 <- dT_P4$Weights_9
    w_10 <- dT_P4$Weights_10

    dat4 <- list (N = N, 
              ncases = ncases, A = A, B = B, id = id, P = imput, nn = nn,
              weights = cbind(w_1, w_2, w_3, w_4, w_5, w_6, w_7, w_8, w_9, w_10))

    ##imput = `5`
    dT_P5<- dat[dat$imput == '5',]

    N <- nrow(dT_P5)
    C <-ncol(dT_P5)
    ncases <- dT_P5$Pass
    nn <- dT_P5$Fail + dT_P5$Pass
    A <- dT_P5$A
    B <- dT_P5$B
    id <- dT_P5$id
    imput <- dT_P5$imput
    w_1 <- dT_P5$Weights_1
    w_2 <- dT_P5$Weights_2
    w_3 <- dT_P5$Weights_3
    w_4 <- dT_P5$Weights_4
    w_5 <- dT_P5$Weights_5
    w_6 <- dT_P5$Weights_6
    w_7 <- dT_P5$Weights_7
    w_8 <- dT_P5$Weights_8
    w_9 <- dT_P5$Weights_9
    w_10 <- dT_P5$Weights_10

    dat5 <- list (N = N, 
              ncases = ncases, A = A, B = B, id = id, P = imput, nn = nn,
              weights = cbind(w_1, w_2, w_3, w_4, w_5, w_6, w_7, w_8, w_9, w_10))

##creating the list of lists:

    mydatalist <- list(dat1, dat2, dat3, dat4, dat5) 

推荐答案

您可以:

  1. 首先,您使用split
  2. 拆分data.frame
  3. 然后使用lapply将功能应用于子集列表
  1. first you split your data.frame using split
  2. then use lapply to apply a function to your list of subsets

这是一个例子

l <- split(dat, dat$imput)
fun <- function(x) {
  w <- x[, grep('Weights', colnames(x))]
  colnames(w) <- paste0('w_', 1:10)
  w <- data.matrix(w)
  return(list(N = nrow(x),
              C = ncol(x),
              ncases = x$Pass,
              A = x$A,
              B = x$B,
              id = x$id,
              P = x$imput,
              nn = x$Fail + x$Pass,
              weights = w))
}
mydatalist <- lapply(l, fun)

这篇关于如何有效地将数据帧分成几个大块,以传递到列表列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆