Statsmodels 公式 API (patsy):如何排除交互组件的子集? [英] Statsmodels formula API (patsy): How to exclude a subset of interaction components?

查看:22
本文介绍了Statsmodels 公式 API (patsy):如何排除交互组件的子集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 statsmodels 公式 API(来自 patsy)构建 WLS (statsmodels.formula.api.wls) 模型,并且我正在使用因素之间的相互作用.其中一些是预测性的,而另一些则不是.有没有办法只在模型中包含交互的一个子集,而无需手动构建设计矩阵?

I'm building a WLS (statsmodels.formula.api.wls) model using the statsmodels formulas API (from patsy) and I'm using interactions between factors. Some of these are predictive whereas others are not. Is there a way to include only a subset of the interactions in the model without resorting to building a design matrix by hand?

或者,有没有办法将模型变量子集的估计系数限制为零?

Alternatively, is there a way to constrain the estimated coefficients of a subset of the model variables to be equal to zero?

推荐答案

我不确定我是否完全理解您需要什么,但我建议您从真正出色的糊状文档(patsy 处理 statsmodels 的公式)开始.关于分类数据有一个很好的部分:http://patsy.readthedocs.org/en/latest/index.html

I'm not sure I understand exactly what you need, but I suggest you start with the truly excellent pasty docs (patsy handles formulas for statsmodels). There's a nice section on categorical data: http://patsy.readthedocs.org/en/latest/index.html

我的猜测是,通过单个公式调用将很难实现您想要的.我可能只是使用 patsy 来构建一个比我需要的更多项的设计矩阵,然后删除列.例如:

My guess is that what you want is going to be hard to achieve with a single formula call. I would probably just use patsy to build a design matrix with more terms than I need and then drop columns. For example:

In [28]: import statsmodels.formula.api as sm
In [29]: import pandas as pd
In [30]: import numpy as np
In [31]: import patsy
In [32]: url = "http://vincentarelbundock.github.com/Rdatasets/csv/HistData/Guerry.csv"
In [33]: df = pd.read_csv(url)
In [34]: w = np.ones(df.shape[0])
In [35]: f = 'Lottery ~ Wealth : C(Region)'
In [36]: y,X = patsy.dmatrices(f, df, return_type='dataframe')
In [37]: X.head()
Out[37]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns:
Intercept                5  non-null values
Wealth:C(Region)[nan]    5  non-null values
Wealth:C(Region)[C]      5  non-null values
Wealth:C(Region)[E]      5  non-null values
Wealth:C(Region)[N]      5  non-null values
Wealth:C(Region)[S]      5  non-null values
Wealth:C(Region)[W]      5  non-null values
dtypes: float64(7)

In [38]: X = X.ix[:,[2,3,4]]
In [39]: X.head()
Out[39]: 
   Wealth:C(Region)[C]  Wealth:C(Region)[E]  Wealth:C(Region)[N]
0                    0                   73                    0
1                    0                    0                   22
2                   61                    0                    0
3                    0                   76                    0
4                    0                   83                    0

In [40]: mod = sm.WLS(y, X, 1./w).fit()
In [41]: mod.params
Out[41]: 
Wealth:C(Region)[C]    1.084430
Wealth:C(Region)[E]    0.650396
Wealth:C(Region)[N]    1.021582

这篇关于Statsmodels 公式 API (patsy):如何排除交互组件的子集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆