在R中的data.table内部运行函数 [英] Run a function inside data.table in R
问题描述
我在R中有一些data.table格式的数据,我需要运行一些功能。
Hi I have some data in data.table format in R and I need to run some function.
假设我有一个名为A的data.table,带有列
Let say I have a data.table called A with columns, "name" "height", "weight".
我想运行一些函数,即data.table中的线性回归并将系数RMSE存储到表中结果。
I want to run some function, i.e. linear regression within data.table and store the coefficients, RMSE into the table results.
A[, .(beta = lm(height ~ weight)$coefficients[2], RMSE =
as.numeric(sqrt(crossprod(lm(height
~ weight)$residuals)/(length(lm(height ~ weight)$residuals)-
(length(coef(lm(height ~ weight)))-1)))*100),
by=.(name)]
我的问题:是否有一种方法可以将lm(height〜weight)结果保存为一个对象,然后访问该对象的数据,因此data.table不需要在这里像4次一样运行lm函数?
My question: Is there a way to save the lm(height ~ weight) result as an object and then access this object's data so data.table don't need to run the lm function like 4 times in here?
这可以运行,但是与我使用foreach并循环遍历名称相比,它有点慢,因为我有数百万行数据。
This runs but it is a bit too slow compared to me using foreach and loop over "name" as I have millions rows of data.
谢谢。
推荐答案
通过使用Henrik建议的匿名正文,我可以加快流程!
By using anonymous body as suggested by Henrik, I am able to speed up the process!
A[, {model <- lm(height ~ weight)
BETA <- model$coefficient[2]
RMSE <- as.numeric(sqrt(crossprod(model$residuals)/(length(model$residuals)-
(length(coef(model))-1)))*100)
list(BETA = BETA, RMSE = RMSE)
},
by = .(name)]
显然,匿名主体(lambda)不需要名称,就像一次运行就忘记了。在此lambda中, lm()
函数运行一次(每个组),并将结果存储在对象中。
Apparently, an anonymous body (lambda) does not require a name and it is like "run once and forget". Inside this lambda, the lm()
function is ran once (per group), and the result stored in an object.
然后我们可以从模型对象中提取所需的数据,最后提供 list()
来让 j
将提取的数据转换为列。
We can then extract the required data from the model object and lastly list()
is provided to let j
convert the extracted data into columns.
非常感谢!
这篇关于在R中的data.table内部运行函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!