R plyr，data.table，应用data.frame的某些列 [英] R plyr, data.table, apply certain columns of data.frame

查看：32 发布时间：2021/11/16 23:10:29 r data.table plyr apply

本文介绍了R plyr，data.table，应用data.frame的某些列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找加快代码速度的方法.我正在研究 apply/ply 方法以及 data.table.不幸的是，我遇到了问题.

这是一个小示例数据:

ids1 <- c(1, 1, 1, 1, 2, 2, 2, 2)ids2 <- c(1, 2, 3, 4, 1, 2, 3, 4)chars1 <- c("aa", " bb ", "__cc__", "dd", "__ee", NA,NA, "n/a")chars2 <- c("vv", "_ ww_", " xx ", "yy__", " zz", NA, "n/a", "n/a")数据 <- data.frame(col1 = ids1, col2 = ids2,col3 = chars1, col4 = chars2,stringsAsFactors = FALSE)

这是一个使用循环的解决方案:

library("plyr")cols_to_fix <- c("col3","col4")for (i in 1:length(cols_to_fix)) {数据[,cols_to_fix[i]] <- gsub("_", "", data[,cols_to_fix[i]])数据[,cols_to_fix[i]] <- gsub(" ", "", data[,cols_to_fix[i]])数据[,cols_to_fix[i]] <- ifelse(data[,cols_to_fix[i]]=="n/a", NA, data[,cols_to_fix[i]])}

我最初查看了 ddply，但我想使用的一些方法仅采用向量.因此，我无法弄清楚如何在某些列中逐一执行 ddply.

此外，我一直在查看 laply，但我想返回带有更改的原始 data.frame.谁能帮我?谢谢.

<小时>

根据之前的建议，这是我尝试从 plyr 包中使用的内容.

选项 1:

data[,cols_to_fix] <- aaply(data[,cols_to_fix],2, function(x){x <- gsub("_", "", x,perl=TRUE)x <- gsub(" ", "", x,perl=TRUE)x <- ifelse(x=="n/a", NA, x)},.progress = "text",.drop = FALSE)

选项 2:

data[,cols_to_fix] <- alply(data[,cols_to_fix],2, function(x){x <- gsub("_", "", x,perl=TRUE)x <- gsub(" ", "", x,perl=TRUE)x <- ifelse(x=="n/a", NA, x)},.progress = "文本")

选项 3:

data[,cols_to_fix] <- adply(data[,cols_to_fix],2, function(x){x <- gsub("_", "", x,perl=TRUE)x <- gsub(" ", "", x,perl=TRUE)x <- ifelse(x=="n/a", NA, x)},.progress = "文本")

这些都没有给我正确的答案.

apply 效果很好，但我的数据非常大，plyr 包中的进度条会非常好.再次感谢.

解决方案

这是一个使用 set 的 data.table 解决方案.

require(data.table)DT <- data.table(data)for (j in cols_to_fix) {set(DT, i=NULL, j=j, value=gsub("[ _]", "", DT[[j]], perl=TRUE))set(DT, i=which(DT[[j]] == "n/a"), j=j, value=NA_character_)}DT# col1 col2 col3 col4# 1: 1 1 aa vv# 2: 1 2 bb ww# 3: 1 3 cc xx# 4: 1 4 dd yy# 5: 2 1 ee zz# 6: 2 2 NA NA# 7: 2 3 NA NA# 8: 2 4 NA NA

<块引用>

第一行读取:在 DT 中为所有 i(=NULL) 设置，column=j 为值 gsub(..).
第二行读取:在 DT 中设置，其中 i(=condn) 和 column=j，值为 NA_character_.

注意:使用 PCRE (perl=TRUE) 有很好的加速效果，尤其是在较大的向量上.

I am looking for ways to speed up my code. I am looking into the apply/ply methods as well as data.table. Unfortunately, I am running into problems.

Here is a small sample data:

ids1   <- c(1, 1, 1, 1, 2, 2, 2, 2)
ids2   <- c(1, 2, 3, 4, 1, 2, 3, 4)
chars1 <- c("aa", " bb ", "__cc__", "dd  ", "__ee", NA,NA, "n/a")
chars2 <- c("vv", "_ ww_", "  xx  ", "yy__", "  zz", NA, "n/a", "n/a")
data   <- data.frame(col1 = ids1, col2 = ids2, 
                 col3 = chars1, col4 = chars2, 
          stringsAsFactors = FALSE)

Here is a solution using loops:

library("plyr")
cols_to_fix <- c("col3","col4")
for (i in 1:length(cols_to_fix)) {
  data[,cols_to_fix[i]] <- gsub("_", "", data[,cols_to_fix[i]])
  data[,cols_to_fix[i]] <- gsub(" ", "", data[,cols_to_fix[i]])
  data[,cols_to_fix[i]] <- ifelse(data[,cols_to_fix[i]]=="n/a", NA, data[,cols_to_fix[i]])
}

I initially looked at ddply, but some methods I want to use only take vectors. Hence, I cannot figure out how to do ddply across just certain columns one-by-one.

Also, I have been looking at laply, but I want to return the original data.frame with the changes. Can anyone help me? Thank you.

Based on the suggestions from earlier, here is what I tried to use from the plyr package.

Option 1:

data[,cols_to_fix] <- aaply(data[,cols_to_fix],2, function(x){
   x <- gsub("_", "", x,perl=TRUE)
   x <- gsub(" ", "", x,perl=TRUE)
   x <- ifelse(x=="n/a", NA, x)
},.progress = "text",.drop = FALSE)

Option 2:

data[,cols_to_fix] <- alply(data[,cols_to_fix],2, function(x){
   x <- gsub("_", "", x,perl=TRUE)
   x <- gsub(" ", "", x,perl=TRUE)
   x <- ifelse(x=="n/a", NA, x)
},.progress = "text")

Option 3:

data[,cols_to_fix] <- adply(data[,cols_to_fix],2, function(x){
   x <- gsub("_", "", x,perl=TRUE)
   x <- gsub(" ", "", x,perl=TRUE)
   x <- ifelse(x=="n/a", NA, x)
},.progress = "text")

None of these are giving me the correct answer.

apply works great, but my data is very large and the progress bars from plyr package would be a very nice. Thanks again.

解决方案

Here's a data.table solution using set.

require(data.table)
DT <- data.table(data)
for (j in cols_to_fix) {
    set(DT, i=NULL, j=j, value=gsub("[ _]", "", DT[[j]], perl=TRUE))
    set(DT, i=which(DT[[j]] == "n/a"), j=j, value=NA_character_)
}

DT
#    col1 col2 col3 col4
# 1:    1    1   aa   vv
# 2:    1    2   bb   ww
# 3:    1    3   cc   xx
# 4:    1    4   dd   yy
# 5:    2    1   ee   zz
# 6:    2    2   NA   NA
# 7:    2    3   NA   NA
# 8:    2    4   NA   NA

First line reads: set in DT for all i(=NULL), and column=j the value gsub(..).
Second line reads: set in DT where i(=condn) and column=j with value NA_character_.

Note: Using PCRE (perl=TRUE) has nice speed-up, especially on bigger vectors.

这篇关于R plyr，data.table，应用data.frame的某些列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R plyr，data.table，应用data.frame的某些列 [英] R plyr, data.table, apply certain columns of data.frame

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R plyr，data.table，应用data.frame的某些列 [英] R plyr, data.table, apply certain columns of data.frame

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭