如何在Julia中估算许多GLM模型? [英] How to estimate many GLM models in Julia?

查看:76
本文介绍了如何在Julia中估算许多GLM模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有5000个变量的数据集.一个目标和4999个协变量.我想为每个目标变量组合(4999个模型)估算一个glm.

I have a dataset of 5000 variables. One target and 4999 covariates. I want to estimate one glm per each target-variable combination (4999 models).

如何在不为GLM手动输入4999公式的情况下做到这一点?

How can I do that without manually typing 4999 formulas for GLM ?

在R中,我将简单地定义4999个字符串的列表("target〜x1"),将每个字符串转换为公式,并使用映射来估计多个glm.朱莉娅可以做类似的事情吗?还是有一个优雅的选择?

In R I would simply define a list of 4999 strings ("target ~ x1) , convert each string to a formula and use map to estimate multiple glm. Is there something similar that can be done in Julia ? Or is there an elegant alternative ?

谢谢.

推荐答案

您可以通过 Term 对象以编程方式创建公式.可以在此处,但请考虑以下满足您需求的简单示例:

You can programatically create the formula via Term objects. The docs for that can be found here, but consider the following simple example which should meet your needs:

从虚拟数据开始

julia> using DataFrames, GLM

julia> df = hcat(DataFrame(y = rand(10)), DataFrame(rand(10, 5)))
10×6 DataFrame
│ Row │ y         │ x1        │ x2       │ x3        │ x4         │ x5       │
│     │ Float64   │ Float64   │ Float64  │ Float64   │ Float64    │ Float64  │
├─────┼───────────┼───────────┼──────────┼───────────┼────────────┼──────────┤
│ 1   │ 0.0200963 │ 0.924856  │ 0.947904 │ 0.429068  │ 0.00833488 │ 0.547378 │
│ 2   │ 0.169498  │ 0.0915296 │ 0.375369 │ 0.0341015 │ 0.390461   │ 0.835634 │
│ 3   │ 0.900145  │ 0.502495  │ 0.38106  │ 0.47253   │ 0.637731   │ 0.814095 │
│ 4   │ 0.255163  │ 0.865253  │ 0.791909 │ 0.0833828 │ 0.741899   │ 0.961041 │
│ 5   │ 0.651996  │ 0.29538   │ 0.161443 │ 0.23427   │ 0.23132    │ 0.947486 │
│ 6   │ 0.305908  │ 0.170662  │ 0.569827 │ 0.178898  │ 0.314841   │ 0.237354 │
│ 7   │ 0.308431  │ 0.835606  │ 0.114943 │ 0.19743   │ 0.344216   │ 0.97108  │
│ 8   │ 0.344968  │ 0.452961  │ 0.595219 │ 0.313425  │ 0.102282   │ 0.456764 │
│ 9   │ 0.126244  │ 0.593456  │ 0.818383 │ 0.485622  │ 0.151394   │ 0.043125 │
│ 10  │ 0.60174   │ 0.8977    │ 0.643095 │ 0.0865611 │ 0.482014   │ 0.858999 │

现在,当您使用GLM运行线性模型时,您将执行类似 lm(@formula(y〜x1),df)的操作,实际上这确实不容易在循环中使用.构造不同的公式.因此,我们将遵循文档并直接创建 @formula 宏的输出-记住Julia中的宏只是将语法转换为其他语法,因此它们不会做我们自己不能做的任何事情!

Now when you run a linear model with GLM, you'd do something like lm(@formula(y ~ x1), df), which indeed can't easily be used in a loop to construct different formulas. We'll therefore follow the docs and create the output of the @formula macro directly - remember macros in Julia just transform syntax to other syntax, so they don't do anything we can't write ourselves!

julia> lm(Term(:y) ~ Term(:x1), df)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

y ~ 1 + x1

Coefficients:
──────────────────────────────────────────────────────────────────────────
                 Coef.  Std. Error      t  Pr(>|t|)   Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────────
(Intercept)   0.428436    0.193671   2.21    0.0579  -0.0181696   0.875041
x1           -0.106603    0.304597  -0.35    0.7354  -0.809005    0.595799
──────────────────────────────────────────────────────────────────────────

您可以自己验证上述内容是否等同于 lm(@formula(y〜x1),df).

You can verify for yourself that the above is equivalent to lm(@formula(y ~ x1), df).

现在希望这是构建所需循环的简单步骤(以下限制为两个协变量以限制输出):

Now it's hopefully an easy step to building the loop that you're looking for (restricted to two covariates below to limit the output):


julia> for x ∈ names(df[:, Not(:y)])[1:2]
           @show lm(term(:y) ~ term(x), df)
       end
lm(term(:y) ~ term(x), df) = StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

y ~ 1 + x1

Coefficients:
──────────────────────────────────────────────────────────────────────────
                 Coef.  Std. Error      t  Pr(>|t|)   Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────────
(Intercept)   0.428436    0.193671   2.21    0.0579  -0.0181696   0.875041
x1           -0.106603    0.304597  -0.35    0.7354  -0.809005    0.595799
──────────────────────────────────────────────────────────────────────────
lm(Term(:y) ~ Term(x), df) = StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

y ~ 1 + x2

Coefficients:
─────────────────────────────────────────────────────────────────────────
                 Coef.  Std. Error      t  Pr(>|t|)  Lower 95%  Upper 95%
─────────────────────────────────────────────────────────────────────────
(Intercept)   0.639633    0.176542   3.62    0.0068   0.232527    1.04674
x2           -0.502327    0.293693  -1.71    0.1256  -1.17958     0.17493
─────────────────────────────────────────────────────────────────────────

正如Dave在下面指出的那样,在这里使用 term()函数而不是直接在 Term()构造函数中创建术语是有帮助的-这是因为 names(df)返回一个 String s的向量,而 Term()构造函数期望使用 Symbol s. term()具有用于 String s的方法,该方法可以自动处理转换.

As Dave points out below, it's helpful to use the term() function here to create our terms rather than the Term() constructor directly - this is because names(df) returns a vector of Strings, while the Term() constructor expects Symbols. term() has a method for Strings that handles the conversion automatically.

这篇关于如何在Julia中估算许多GLM模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆