为了进行并行测试,如何将for循环转换为lApply函数? [英] How to convert a for-loop to lapply function for parallel testing purposes?

查看:0
本文介绍了为了进行并行测试,如何将for循环转换为lApply函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在研究for循环与apply()函数族的优缺点,答案并不明确(apply()总是比for循环快,这取决于具体情况)。因此,我想根据我的实际数据测试各种选项。

下面是一个for循环,在我看来非常简单,但我不确定如何将其替换为lapply()。我认为lapply()是正确的,因为for循环生成了一个列表对象。

我需要对其运行此分析的实际数据是一个包含250万行、30多列的数据框,因此我想对各种选项运行速度测试。

任何解释都会很有帮助。我在网上找到的示例解释很少,或者for循环示例过于复杂,我希望学会很好地使用apply()族函数,因为它们看起来比for循环更有用、更容易阅读。

以下是简化的for循环代码,带有示例数据帧,出于示例目的,它可以正确运行:

# Set up data frame to perform migration analysis on:
data <- 
  data.frame(
    ID = c(1,1,1,2,2,2,3,3,3),
    Period = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
    Values = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
    Flags = c("X0","X1","X2","X0","X2","X0", "X2","X1","X0")
  )

# Function to set-up base table:
setTable <- function(data){
  df <- data.frame(matrix(NA, ncol=length(unique(data$Flags)), nrow=length(unique(data$Flags))))
  row.names(df) <- unique(data$Flags)
  names(df) <- unique(data$Flags)
  return(df)
}

# Function to complete migration table with for-loop:
migration <- function(data, from=1, to=3){
  df <- setTable(data)
  for (i in unique(data$ID)){
    id_from <- as.character(data$Flags[(data$ID == i & data$Period == from)])
    id_to <- as.character(data$Flags[data$ID == i & data$Period == to])
    column <- which(names(df) == id_from)
    row <- which(row.names(df) == id_to)
    df[row, column] <- ifelse(is.na(df[row, column]), 1, df[row, column] + 1)
  }
  return(df)
}

# Now to run the function:
test1 <- migration(data, from=1, to=3)

推荐答案

编辑:包装在允许指定From&;To:

的函数中
library(data.table)

DF <- data.frame(
  ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
  Period = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3),
  Values = c(5, 10, 15, 0, 2, 4, 3, 6, 9, 3, 6, 9),
  Flags = c("X0","X1","X2","X0","X2","X0", "X2","X1","X0", "X2","X1","X0")
)

migration <- function(DT, from=1, to=3){
  setDT(DT)
  unique_flags <- unique(DT$Flags)
  all_flags <- setDT(expand.grid(list(from_flag = unique_flags, to_flag = unique_flags)))
  
  dcast(DT[, .(from_flag = Flags[Period == from], to_flag = Flags[Period == to]), by = ID][
    ,.N, c("from_flag", "to_flag")][
      all_flags, on = c("from_flag", "to_flag")], to_flag ~ from_flag, value.var = "N")
}

migration(DF, 1, 3)

谈到R中的speed时,您几乎总是可以依靠库(data.table):

library(data.table)

DT <- setDT(data.frame(
  ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
  Period = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3),
  Values = c(5, 10, 15, 0, 2, 4, 3, 6, 9, 3, 6, 9),
  Flags = c("X0","X1","X2","X0","X2","X0", "X2","X1","X0", "X2","X1","X0")
))

unique_flags <- unique(DT$Flags)
all_flags <- setDT(expand.grid(list(first_flag = unique_flags, last_flag = unique_flags)))

resultDT <- dcast(DT[, .(first_flag = first(Flags), last_flag = last(Flags)), by = ID][
  ,.N, c("first_flag", "last_flag")][
    all_flags, on = c("first_flag", "last_flag")], last_flag ~ first_flag, value.var = "N")

print(resultDT)

逐步:

library(data.table)

DT <- setDT(data.frame(
  ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
  Period = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3),
  Values = c(5, 10, 15, 0, 2, 4, 3, 6, 9, 3, 6, 9),
  Flags = c("X0","X1","X2","X0","X2","X0", "X2","X1","X0", "X2","X1","X0")
))

unique_flags <- unique(DT$Flags)
all_flags <- setDT(expand.grid(list(first_flag = unique_flags, last_flag = unique_flags)))

resultDT <- DT[, .(first_flag = first(Flags), last_flag = last(Flags)), by = ID] # find relevant flags
resultDT <- resultDT[,.N, c("first_flag", "last_flag")] # count transitions
resultDT <- resultDT[all_flags, on = c("first_flag", "last_flag")] # merge all combinations
resultDT <- dcast(resultDT, last_flag ~ first_flag, value.var = "N") # dcast
print(resultDT)

关于lapply您可以做的(我更喜欢data.table):

# Set up data frame to perform migration analysis on:
input_data <- 
  data.frame(
    ID = c(1,1,1,2,2,2,3,3,3),
    Period = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
    Values = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
    Flags = c("X0","X1","X2","X0","X2","X0", "X2","X1","X0")
  )

# Function to set-up base table:
setTable <- function(data){
  DF <- data.frame(matrix(NA, ncol=length(unique(data$Flags)), nrow=length(unique(data$Flags))))
  row.names(DF) <- unique(data$Flags)
  names(DF) <- unique(data$Flags)
  return(DF)
}

# Function to complete migration table with for-loop:
migration <- function(data, from=1, to=3){
  DF <- setTable(data)
  lapply(seq_along(unique(data$ID)), function(i){
    id_from <- as.character(data$Flags[(data$ID == i & data$Period == from)])
    id_to <- as.character(data$Flags[data$ID == i & data$Period == to])
    column <- which(names(DF) == id_from)
    row <- which(row.names(DF) == id_to)
    DF[row, column] <<- ifelse(is.na(DF[row, column]), 1, DF[row, column] + 1)
  })
  return(DF)
}

# Now to run the function:
test1 <- migration(input_data, from=1, to=3)

这篇关于为了进行并行测试,如何将for循环转换为lApply函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆