以编程方式将列名传递给data.table [英] passing column names to data.table programmatically

查看:163
本文介绍了以编程方式将列名传递给data.table的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想能够编写一个函数,在组中运行 data.table 中的回归,然后很好地组织结果。这里是我想做的一个例子:

I would like to be able to write a function that runs regressions in a data.table by groups and then nicely organizes the results. Here is a sample of what I would like to do:

require(data.table)
dtb = data.table(y=1:10, x=10:1, z=sample(1:10), weights=1:10, thedate=1:2)
models = c("y ~ x", "y ~ z")

res = lapply(models, function(f) {dtb[,as.list(coef(lm(f, weights=weights, data=.SD))),by=thedate]})

#do more stuff with res

喜欢将所有这些包装到一个函数中,因为 #doe more stuff 可能很长。我面临的问题是如何将各种名称的东西传递到 data.table ?例如,如何传递列名称 weights ?如何传递 thedate ?我想象一个这样的原型:

I would like to wrap all this into a function since the #doe more stuff might be long. The issue I face is how to pass the various names of things to data.table? For example, how do I pass the column name weights? how do I pass thedate? I envision a prototype that looks like this:

myfun = function(dtb, models, weights, dates)

让我清楚的是:将公式传递给我的函数不是问题。如果 weights 我想使用和描述日期的列名, thedate 已知,那么我的函数可以简单地看像这样:

Let me be clear: passing the formulas to my function is NOT the problem. If the weights I wanted to use and the column name describing the date, thedate were known then my function could simply look like this:

 myfun = function(dtb, models) {
res = lapply(models, function(f) {dtb[,as.list(coef(lm(f, weights=weights, data=.SD))),by=thedate]})

 #do more stuff with res
 }

但是,对应于 thedate weights 提前未知。我想把它们传递给我的函数如下:

However the column names corresponding to thedate and to the weights are unknown in advance. I would like to pass them to my function as so:

#this will not work
myfun = function(dtb, models, w, d) {
res = lapply(models, function(f) {dtb[,as.list(coef(lm(f, weights=w, data=.SD))),by=d]})

 #do more stuff with res
 }

谢谢

推荐答案

这里是一个解决方案,它依赖于数据以长格式(这更有意义,这个cas

Here is a solution that relies on having the data in long format (which makes more sense to me, in this cas

library(reshape2)
dtlong <- data.table(melt(dtb, measure.var = c('x','z')))


foo <- function(f, d, by, w ){
  # get the name of the w argument (weights)
  w.char <- deparse(substitute(w))
  # convert `list(a,b)` to `c('a','b')`
  # obviously, this would have to change depending on how `by` was defined
  by <- unlist(lapply(as.list(as.list(match.call())[['by']])[-1], as.character))
  # create the call substituting the names as required
  .c <- substitute(as.list(coef(lm(f, data = .SD, weights = w), list(w = as.name(w.char)))))
  # actually perform the calculations
  d[,eval(.c), by = by]
}

foo(f= y~value, d= dtlong, by = list(variable, thedate), w = weights)

   variable thedate (Intercept)       value
1:        x       1   11.000000 -1.00000000
2:        x       2   11.000000 -1.00000000
3:        z       1    1.009595  0.89019190
4:        z       2    7.538462 -0.03846154

这篇关于以编程方式将列名传递给data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆