如何使用statsmodels获得用于多元线性回归的标准化(Beta)系数 [英] how to get standardised (Beta) coefficients for multiple linear regression using statsmodels

查看:847
本文介绍了如何使用statsmodels获得用于多元线性回归的标准化(Beta)系数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在通过熊猫统计模型使用.summary()函数时,OLS回归结果包括以下字段.

when using the .summary() function using pandas statsmodels, the OLS Regression Results include the following fields.

coef    std err          t      P>|t|      [0.025      0.975]

如何获得标准化系数(不包括截距),类似于SPSS中可以达到的标准系数?

How can I get the standardised coefficients (which exclude the intercept), similarly to what is achievable in SPSS?

推荐答案

您只需要先使用z分布(即z分数)来标准化原始DataFrame,然后执行线性回归即可.

You just need to standardize your original DataFrame using a z distribution (i.e., z-score) first and then perform a linear regression.

假定您将数据框命名为df,它具有独立变量x1x2x3,以及因变量y.考虑以下代码:

Assume you name your dataframe as df, which has independent variables x1, x2, and x3, and dependent variable y. Consider the following code:

import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.formula.api as smf

# standardizing dataframe
df_z = df.select_dtypes(include=[np.number]).dropna().apply(stats.zscore)

# fitting regression
formula = 'y ~ x1 + x2 + x3'
result = smf.ols(formula, data=df_z).fit()

# checking results
result.summary()

现在,coef将为您显示标准化的系数,以便您可以比较它们对因变量的影响.

Now, the coef will show you the standardized (beta) coefficients so that you can compare their influence on your dependent variable.

注意:

  1. 请记住,您需要.dropna().否则,如果stats.zscore缺少任何值,将返回该列的所有NaN.
  2. 您可以手动选择列,而不要使用.select_dtypes(),但要确保选择的所有列都是数字.
  3. 如果您只关心标准化的(beta)系数,则也可以使用result.params仅将其返回.通常,它将以科学符号的方式显示.您可以使用round(result.params, 5)之类的符号对它们进行四舍五入.
  1. Please keep in mind that you need .dropna(). Otherwise, stats.zscore will return all NaN for a column if it has any missing values.
  2. Instead of using .select_dtypes(), you can select column manually but make sure all the columns you selected are numeric.
  3. If you only care about the standardized (beta) coefficients, you can also use result.params to return it only. It will usually be displayed in a scientific-notation fashion. You can use something like round(result.params, 5) to round them.

这篇关于如何使用statsmodels获得用于多元线性回归的标准化(Beta)系数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆