pandas :麻烦实施Panel OLS [英] Pandas: Trouble implementing Panel OLS
问题描述
我在理解如何在pandas中实现Panel OLS时遇到了一些困难。我收到了关于这个主题的帮助,我以为我了解情况。现在我正在努力实施,我遇到了困难。以下是我的数据:
I'm having a little bit of a difficult time understanding how to implement the Panel OLS in pandas. I have received help on this topic and I thought I was understanding the situation. Now that I am trying to implement I am having difficulty. Below is my data:
url='https://raw.githubusercontent.com/108michael/ms_thesis/master/crsp.dime.mpl.df.1'
df=pd.read_csv(url, usecols=(['date', 'cid', 'log_diff_rgdp', 'billsum_support', \
'years_exp', 'leg_totalbills', 'log_diff_rgdp', 'unemployment', 'expendituresfor',\
'direct_expenditures', 'indirect_expenditures', 'Republican', 'sen'])))
df.head(1)
cid date log_diff_rgdp unemployment leg_totalbills years_exp Republican sen billsum_support expendituresfor direct_expenditures indirect_expenditures
0 N00013870 2007 0.026069 4.6 44 5 1.0 1.0 1.0 4.0 4.0 0.0
df=df.T.to_panel()
df=df.transpose(2,0,1)
df
<class 'pandas.core.panel.Panel'>
Dimensions: 505 (items) x 10 (major_axis) x 72 (minor_axis)
Items axis: N00000010 to N00035686
Major_axis axis: 2005 to 2014
Minor_axis axis: index to indirect_expenditures
我的理解(我想我可能错了)项目轴
包含所有面板
; Minor_axis
包含每个面板中的所有列
;并且 Major_axis
是时间索引
。我发布了第一行数据,然后将其发送到 Panel
, billsum_support
是最后一列的第4行;但是,当我尝试使用 billsum_support
作为 Y
变量回归时,我收到以下错误。
It is my understanding (I think I could be wrong about this) that the Items axis
contains all of the panels
; that the Minor_axis
contains all of the columns in each of the panels
; and that the Major_axis
is the time index
. I have posted the first row of my data before sending it to Panel
and billsum_support
is the 4th from the last column; but, when I try to regress with billsum_support
as the Y
variable I get the following error.
reg=PanelOLS(y=df['billsum_support'],x=df[['years_exp', 'unemployment', 'dir_ind_expendituresfor']],time_effects=True)
reg
KeyError Traceback (most recent call last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
1875 try:
-> 1876 return self._engine.get_loc(key)
1877 except KeyError:
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4027)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3891)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12408)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12359)()
KeyError: 'billsum_support'
我见过工作示例这里但是这个人似乎有他们的数据是堆叠格式而不是Panel。
是否有人对OLS Panel有一定的经验并能理解我在这里做错了什么?
I have seen the working example here but this person seems to have their data in stacked format instead of Panel. Is there someone that has some experience with OLS Panel and can understand what I am doing wrong here?
推荐答案
我得到它了;跟进 ptrj ,以及做一些简单的探索我找到了解决方案,并将其发布在问题中
I got it; following up on ptrj, and doing some simple exploring I found the solution and will post it in the question
df=df.pivot_table(index='date',columns='cid', fill_value=0,aggfunc=np.mean)
df=df.T.to_panel()
df=df.transpose(2,1,0)
df=df.to_frame()
这篇关于 pandas :麻烦实施Panel OLS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!