根据R中循环的循环号为数据帧生成变量名 [英] Generating variable names for dataframes based on the loop number in a loop in R

查看:55
本文介绍了根据R中循环的循环号为数据帧生成变量名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用lm()函数以及随后的step()函数开发和优化线性模型,以进行优化.我通过使用0和1(每个为50%的机会)的随机生成器向我的数据帧添加了一个变量.我使用此变量将数据框分为训练集和验证集.如果未将记录分配给训练集,则会将记录分配给验证集.通过使用这些子集,我能够估计模型的拟合程度(通过对验证集中的记录使用预测函数并将它们与原始值进行比较).我对优化模型的系数以及预测结果与实际结果之间的KS检验结果感兴趣.

I am working on developing and optimizing a linear model using the lm() function and subsequently the step() function for optimization. I have added a variable to my dataframe by using a random generator of 0s and 1s (50% chance each). I use this variable to subset the dataframe into a training set and a validation set If a record is not assigned to the training set it is assigned to the validation set. By using these subsets I am able to estimate how good the fit of the model is (by using the predict function for the records in the validation set and comparing them to the original values). I am interested in the coefficients of the optimized model and in the results of the KS-test between the distributions of the predicted and actual results.

我的所有代码都工作正常,但是当我想测试模型是否对我选择的子集敏感时,我遇到了一些问题.为此,我想每次使用不同的随机子集创建一个for(i in 1:10)循环.事实证明,这对我来说是一个很大的挑战(我以前从未在R中使用过for循环).

All of my code was working fine, but when I wanted to test whether my model is sensitive to the subset that I chose I ran into some problems. To do this I wanted to create a for (i in 1:10) loop, each time using a different random subset. This turned out to be quite a challenge for me (I have never used a for loop in R before).

问题出在这里(实际上有很多问题,但这是其中之一):

Here's the problem (well actually there are many problems, but here is one of them):

我想为循环中的每次运行提供单独的数据帧,并使用唯一的名称(例如:Run1,Run2,Run3).我已经能够使用 paste(("Run",1:10,sep="") 创建一个具有不同字符串的变量,但这只是给你一个字符串列表.我如何使用这些字符串作为我的 (子集)数据帧?

I would like to have separate dataframes for each run in the loop with a unique name (for example: Run1, Run2, Run3). I have been able to create a variable with different strings using paste(("Run",1:10,sep=""), but that just gives you a list of strings. How do I use these strings as names for my (subsetted) dataframes?

我希望遇到的另一个问题:随后,我想为每次运行使用拟合系数,并将其导出到Excel.通过使用coef(function),我已经能够检索系数,但是由于优化算法,每次仿真运行中模型中包含的系数数量可能会发生变化.几乎可以肯定会给我带来一些麻烦,将它们粘贴到相同的数据框中,对此有何想法?

Another problem that I expect to encounter: Subsequently I want to use the fitted coefficients for each run and export these to Excel. By using coef(function) I have been able to retrieve the coefficients, however the number of coefficients included in the model may change per simulation run because of the optimization algorithm. This will almost certainly give me some trouble with pasting them into the same dataframe, any thoughts on that?

感谢您的帮助.

推荐答案

第一个问题:

您可以使用

df.names <- paste(("Run",1:10,sep="")

然后,创建您的for循环并执行以下操作,为数据框指定所需的名称:

Then, create your for loop and do the following to give the data frames the names you want:

for (i in 1:10){
   d.frame <- # create your data frame here
   assign(df.name[i], d.frame)
}

现在,您将得到带有十个不同名称的十个数据帧.

Now you will end up with ten data frames with ten different names.

关于系数的第二个问题:

For your second question about the coefficients:

据我所知,它们自然不适合您的数据帧结构.您应该考虑使用列表,因为它们允许不同的类-换句话说,对于每次运行,请创建一个包含数据框和带有系数的数值向量的列表.

As far as I can tell, these don't naturally fit into your data frame structure. You should consider using lists, as they allow different classes - in other words, for each run, create a list containing a data frame and a numeric vector with your coefficients.

这篇关于根据R中循环的循环号为数据帧生成变量名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆