Stata:在单个数据集中组合来自多个回归的系数/标准误差(变量数量可能有所不同) [英] Stata: combining coefficients/standard errors from several regressions in a single dataset (number of variables may differ)

查看:634
本文介绍了Stata:在单个数据集中组合来自多个回归的系数/标准误差(变量数量可能有所不同)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经问过有关在单个数据集中存储多个回归的系数和标准误差的问题.

让我重申我最初提出的问题的目标:

我想运行多个回归并将其结果存储在 我以后可以用来分析的DTA文件.我的约束是:

  1. 我无法安装模块(我正在为他人编写代码,而不是 确定他们安装了哪些模块)
  2. 一些回归变量是因子变量.
  3. 每个回归仅因依赖项而异 变量,所以我想将其存储在最终数据集中以保持 跟踪系数/方差对应于什么回归.

Roberto Ferrer建议的解决方案在我的测试数据上运行良好,但在其他类型的数据上却运行不佳.原因是我的样本从一个回归到另一个回归略有变化,并且某些因子变量在每个回归中的取值数都不相同.这样会导致固定效果(使用i.myvar作为回归器动态创建)的基数不同.

假设我决定使用i.year来确定年度固定效应(例如:特定年份的截距),但是在一次回归中没有对2006年的观察.这意味着该特定回归将减少一个回归值(不会创建与year == 2006对应的虚拟对象),因此会生成一个较小的存储系数的矩阵.

尝试将矩阵堆叠在一起时,这会导致一致性错误.

我想知道是否有一种方法可以使初始解决方案对各种数量的回归变量都具有鲁棒性. (也许将每个回归保存为dta,然后合并?)

我仍然受制于不能依赖外部软件包的约束.

解决方案

您可以遵循append数据集策略,对所引用问题中的代码进行少量更改:

clear
set more off

save test.dta, emptyok replace

foreach depvar in marriage divorce {

    // test data
    sysuse census, clear 
    generate constant = 1
    replace marriage = . if region == 4 

    // regression
    reg `depvar' popurban i.region constant, robust noconstant  // regressions
    matrix result_matrix = e(b)\vecdiag(e(V))                   // grab coeffs and their variances in a 2xK matrix
    matrix rownames result_matrix = `depvar'_b `depvar'_v       // add rownames to the two extra rows

    // get original column names of matrix
    local names : colfullnames result_matrix

    // get original row names of matrix (and row count)
    local rownames : rowfullnames result_matrix
    local c : word count `rownames'

    // make original names legal variable names
    local newnames
    foreach name of local names {
        local newnames `newnames' `=strtoname("`name'")'
    }

    // rename columns of matrix
    matrix colnames result_matrix = `newnames'

    // from matrix to dataset
    clear
    svmat result_matrix, names(col)

    // add matrix row names to dataset
    gen rownames = ""
    forvalues i = 1/`c' {
        replace rownames = "`:word `i' of `rownames''" in `i'
    }

    // append
    append using "test.dta"
    save "test.dta", replace

}

// list
order rownames
list, noobs

结果就是您想要的.但是,问题在于每次循环时都会重新加载数据集.它加载数据的次数是您估计的回归次数.

您可能想看一下post并检查是否可以管理更有效的解决方案. statsby也可以,但是您需要找到一种聪明的方式来重命名存储的变量.

I have already asked a question about storing coefficients and standard errors of several regressions in a single dataset.

Let me just reiterate the objective of my initial question:

I would like to run several regressions and store their results in a DTA file that I could later use for analysis. My constraints are:

  1. I cannot install modules (I am writing code for other people and not sure what modules they have installed)
  2. Some of the regressors are factor variables.
  3. Each regression differ only by the dependent variable, so I would like to store that in the final dataset to keep track of what regression the coefficients/variances correspond to.

The solution suggest by Roberto Ferrer was working well on my test data, but turns out not to work so well on some other type of data. The reason is that my sample changes slightly from one regression to the next, and some factor variable does not take the same number of values in each regressions. This results in the fixed effects (created on the fly using i.myvar as a regressor) not having the same cardinality.

Let's say that I decide to put year fixed effects (as in: year-specific intercepts) using i.year but in one regression there is no observation for the year 2006. That means that this particular regression will have one fewer regressor (the dummy corresponding to year==2006 does not get created), and as a result a smaller matrix that stores the coeffs.

This results in a conformability error when trying to stack the matrices together.

I was wondering if there was a way to make the initial solution robust to varying number of regressors. (Perhaps saving each regressions as dta, then merging?)

I am still subject to the constraint that I cannot rely on external packages.

解决方案

You can follow the strategy of appending datasets, making small changes to the code in the question you reference:

clear
set more off

save test.dta, emptyok replace

foreach depvar in marriage divorce {

    // test data
    sysuse census, clear 
    generate constant = 1
    replace marriage = . if region == 4 

    // regression
    reg `depvar' popurban i.region constant, robust noconstant  // regressions
    matrix result_matrix = e(b)\vecdiag(e(V))                   // grab coeffs and their variances in a 2xK matrix
    matrix rownames result_matrix = `depvar'_b `depvar'_v       // add rownames to the two extra rows

    // get original column names of matrix
    local names : colfullnames result_matrix

    // get original row names of matrix (and row count)
    local rownames : rowfullnames result_matrix
    local c : word count `rownames'

    // make original names legal variable names
    local newnames
    foreach name of local names {
        local newnames `newnames' `=strtoname("`name'")'
    }

    // rename columns of matrix
    matrix colnames result_matrix = `newnames'

    // from matrix to dataset
    clear
    svmat result_matrix, names(col)

    // add matrix row names to dataset
    gen rownames = ""
    forvalues i = 1/`c' {
        replace rownames = "`:word `i' of `rownames''" in `i'
    }

    // append
    append using "test.dta"
    save "test.dta", replace

}

// list
order rownames
list, noobs

The result is what you want. However, the problem is that the dataset is re-loaded every time around the loop; it loads data as many times as regressions you estimate.

You may want to take a look at post and check if you can manage a more efficient solution. statsby could also work, but you need to find a smart way of renaming the stored variables.

这篇关于Stata:在单个数据集中组合来自多个回归的系数/标准误差(变量数量可能有所不同)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆