复杂的数据表子集和矢量化处理 [英] complex data.table subset and vectorised maniulation

查看:40
本文介绍了复杂的数据表子集和矢量化处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,我有一个使用data.frames构建的复杂函数,为了加快速度,我转向了data.table.我对此很陌生,所以我很困惑.无论如何,我已经做了一个简单得多的玩具示例,说明了我想做什么,但是我无法弄清楚如何将其转换为data.table格式.这是data.frame形式的示例:

Ok I have a complex function built using data.frames and in trying to speed it up I've turned to data.table. I'm totally new to this so I'm quite befuddled. Anyhow I've made a much much simpler toy example of what I want to do, but I cannot work out how to translate it into data.table format. Here is the example in data.frame form:

    rows <- 10
    data1 <- data.frame(   id =1:rows,
                    a = seq(0.2, 0.55, length.out = rows),
                  b = seq(0.35, 0.7, length.out = rows),
                  c = seq(0.4, 0.83, length.out = rows),
                  d = seq(0.6, 0.87, length.out = rows),
                  e = seq(0.7, 0.99, length.out = rows),
                  f = seq(0.52, 0.90, length.out = rows)             
    )
    DT1 <- data.table(data1) #for later

    data2 <- data.frame(   id =3:1,
                   a = rep(3, 3),
                   d = rep(2, 3),
                   f = rep(1, 3)
    )
    m.names <- c("a", "d", "f")

    data1[match(data2$id, data1$id),m.names] <- data1[match(data2$id, data1$id),m.names] + data2[match(data2$id, data1$id),m.names]

因此请注意,在最后一步中,我想在现有图形和新数据之间进行加法,并将其矢量化为几列.

So note in the last step that I want perform addition between the pre-existing figures and the new data and its vectorised across several columns.

在data.table格式中,我到此为止:

In a data.table format I've only gotten this far:

    DT1[id %in% data2$id, m.names, with=FALSE]

这将选择我要添加的值,但此后我迷路了.我将不胜感激!

This selects the values I want to add to but after that I am lost. I would appreciate any help !

好吧,我已经弄清楚了一部分-我可以使用以上代码的最后一行,使用data2来存储矢量化加法部分,如下所示:

Ok I've figure out part of it - I can use the last line of code above to achieve the vectorised addition part using using data2 to store the added values as follows:

    data2[,m.names] <- data2[,m.names] + data.frame(DT1[id %in% data2$id, m.names, with=FALSE])

即使有250万行(在DT1中)和data2中的10,000行以及6个匹配的列,也只需要0.004sec,但是我仍然需要将新的data2分配给数据1中适当的动态分配的列

Even with 2.5million rows (in DT1) and 10,000 rows in data2 and 6 matching columns this only takes 0.004sec, but I still need to assign the new data2 to the appropriate dynamically assigned columns in data 1

推荐答案

感谢@David Arenburg的建议.我对其进行了稍微修改,以得出以下首选解决方案

Ok thanks to @David Arenburg for his suggestion. I've modified it slightly to arrive at the following for my preferred solution

    text <- NULL
    for(i in 1:length(m.names)){
        text <- paste0(text, m.names[i], " = ", m.names[i], " + i.", m.names[i], ", ")
    }
    expr <- parse(text = paste0("\":=\"(", substr(text, 1, nchar(text)-2), ")" ))

    res2 <- DT1[data2, eval(expr)]

这篇关于复杂的数据表子集和矢量化处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆