复杂的数据表子集和矢量化处理 [英] complex data.table subset and vectorised maniulation

查看：40 发布时间：2021/4/28 19:39:34 r data.table

本文介绍了复杂的数据表子集和矢量化处理的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

好吧，我有一个使用data.frames构建的复杂函数，为了加快速度，我转向了data.table.我对此很陌生，所以我很困惑.无论如何，我已经做了一个简单得多的玩具示例，说明了我想做什么，但是我无法弄清楚如何将其转换为data.table格式.这是data.frame形式的示例:

Ok I have a complex function built using data.frames and in trying to speed it up I've turned to data.table. I'm totally new to this so I'm quite befuddled. Anyhow I've made a much much simpler toy example of what I want to do, but I cannot work out how to translate it into data.table format. Here is the example in data.frame form:

    rows <- 10
    data1 <- data.frame(   id =1:rows,
                    a = seq(0.2, 0.55, length.out = rows),
                  b = seq(0.35, 0.7, length.out = rows),
                  c = seq(0.4, 0.83, length.out = rows),
                  d = seq(0.6, 0.87, length.out = rows),
                  e = seq(0.7, 0.99, length.out = rows),
                  f = seq(0.52, 0.90, length.out = rows)             
    )
    DT1 <- data.table(data1) #for later

    data2 <- data.frame(   id =3:1,
                   a = rep(3, 3),
                   d = rep(2, 3),
                   f = rep(1, 3)
    )
    m.names <- c("a", "d", "f")

    data1[match(data2$id, data1$id),m.names] <- data1[match(data2$id, data1$id),m.names] + data2[match(data2$id, data1$id),m.names]

因此请注意，在最后一步中，我想在现有图形和新数据之间进行加法，并将其矢量化为几列.

So note in the last step that I want perform addition between the pre-existing figures and the new data and its vectorised across several columns.

在data.table格式中，我到此为止:

In a data.table format I've only gotten this far:

    DT1[id %in% data2$id, m.names, with=FALSE]

这将选择我要添加的值，但此后我迷路了.我将不胜感激！

This selects the values I want to add to but after that I am lost. I would appreciate any help !

好吧，我已经弄清楚了一部分-我可以使用以上代码的最后一行，使用data2来存储矢量化加法部分，如下所示:

Ok I've figure out part of it - I can use the last line of code above to achieve the vectorised addition part using using data2 to store the added values as follows:

    data2[,m.names] <- data2[,m.names] + data.frame(DT1[id %in% data2$id, m.names, with=FALSE])

即使有250万行(在DT1中)和data2中的10,000行以及6个匹配的列，也只需要0.004sec，但是我仍然需要将新的data2分配给数据1中适当的动态分配的列

Even with 2.5million rows (in DT1) and 10,000 rows in data2 and 6 matching columns this only takes 0.004sec, but I still need to assign the new data2 to the appropriate dynamically assigned columns in data 1

复杂的数据表子集和矢量化处理 [英] complex data.table subset and vectorised maniulation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

复杂的数据表子集和矢量化处理 [英] complex data.table subset and vectorised maniulation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭