加速此循环以使用data.table创建虚拟列并在R中进行设置 [英] Speed up this loop to create dummy columns with data.table and set in R

查看:91
本文介绍了加速此循环以使用data.table创建虚拟列并在R中进行设置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据表,我想为每个唯一的日子创建一个新列,然后在该天与列名匹配的每一行中分配1。

I have a data table and I want to create a new column for each unique day, and then assign a 1 in each row where the day matches the column name

我已经使用for循环完成了此操作,但我想知道是否有任何方法可以使用data.table和set对其进行优化?

I have done this using a for loop but I was wondering if there was any way to optimise it using data.table and set?

这里是一个示例

dt <- data.table(Week_Day = c("Monday", "Tuesday", "Wednesday",
                          "Thursday", "Friday", "Saturday", "Sunday"))

Day <- unique(dt$Week_Day)
for (i in 1:length(Day)) {
    if (Day[i] != "Sunday") {
        dt[, Day[i] := ifelse(Week_Day == Day[i], 1, 0)]
    }
}

我的表有29.8万行,尽管执行时间并不长(下面),它是一个长脚本的一部分,并且我有很多无效的循环,所以我试图降低整体运行时间。

my table is 298k rows and although it doesn't take long to execute (below), its part of a long script and I have quite a few inefficient loops so I am trying to get the overall run time down.

运行时间:

user  system elapsed
0.99    0.06    1.05

预先感谢。

推荐答案

这是另一种方法,在我的计算机上的性能比问题中的原始方法更好

Here's a different approach that, performs better - on my machine - than the original approach in the question

1)获得独特的日子,除了星期天

1) Get unique days except Sunday

Day <- setdiff(dt$Week_Day, "Sunday")

2)用0初始化新列:

2) Initialize new columns with 0:

dt[, (Day) := 0L]

3)通过循环引用1来更新:

3) Update with 1s by reference in a loop:

for(x in Day) {
  set(dt, i = which(dt[["Week_Day"]] == x), j = x, value = 1L)
}






简单的性能比较:


Simple performance comparison:

dt1 <- data.table(Week_Day = sample(c("Monday", "Tuesday", "Wednesday",
                              "Thursday", "Friday", "Saturday", "Sunday"), 3e5, TRUE))

dt2 <- copy(dt1)


system.time({
  Day <- setdiff(unique(dt$Week_Day), "Sunday")
  dt1[, (Day) := 0L]
  for(x in Day) {
    set(dt1, i = which(dt1[["Week_Day"]] == x), j = x, value = 1L)
  }
})
#       User      System verstrichen 
#      0.029       0.003       0.032 

system.time({
  Day <- unique(dt$Week_Day)
  for (i in 1:length(Day)) {
    if (Day[i] != "Sunday") {
      dt2[, Day[i] := ifelse(Week_Day == Day[i], 1L, 0L)]
    }
  }
})

#       User      System verstrichen 
#      0.138       0.070       0.210 


all.equal(dt1, dt2)
#[1] TRUE

这篇关于加速此循环以使用data.table创建虚拟列并在R中进行设置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆