pandas :麻烦实施Panel OLS [英] Pandas: Trouble implementing Panel OLS

查看:511
本文介绍了 pandas :麻烦实施Panel OLS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在理解如何在pandas中实现Panel OLS时遇到了一些困难。我收到了关于这个主题的帮助,我以为我了解情况。现在我正在努力实施,我遇到了困难。以下是我的数据:

I'm having a little bit of a difficult time understanding how to implement the Panel OLS in pandas. I have received help on this topic and I thought I was understanding the situation. Now that I am trying to implement I am having difficulty. Below is my data:

url='https://raw.githubusercontent.com/108michael/ms_thesis/master/crsp.dime.mpl.df.1'



   df=pd.read_csv(url, usecols=(['date', 'cid', 'log_diff_rgdp', 'billsum_support', \
'years_exp', 'leg_totalbills', 'log_diff_rgdp', 'unemployment',  'expendituresfor',\
    'direct_expenditures', 'indirect_expenditures', 'Republican', 'sen'])))
    df.head(1)  

    cid     date    log_diff_rgdp   unemployment    leg_totalbills  years_exp   Republican  sen     billsum_support     expendituresfor     direct_expenditures     indirect_expenditures
0   N00013870   2007    0.026069    4.6     44  5   1.0     1.0     1.0     4.0     4.0     0.0


df=df.T.to_panel()

df=df.transpose(2,0,1)

df

<class 'pandas.core.panel.Panel'>
Dimensions: 505 (items) x 10 (major_axis) x 72 (minor_axis)
Items axis: N00000010 to N00035686
Major_axis axis: 2005 to 2014
Minor_axis axis: index to indirect_expenditures

我的理解(我想我可能错了)项目轴包含所有面板; Minor_axis 包含每个面板中的所有列;并且 Major_axis 时间索引。我发布了第一行数据,然后将其发送到 Panel billsum_support 是最后一列的第4行;但是,当我尝试使用 billsum_support 作为 Y 变量回归时,我收到以下错误。

It is my understanding (I think I could be wrong about this) that the Items axis contains all of the panels; that the Minor_axis contains all of the columns in each of the panels; and that the Major_axis is the time index. I have posted the first row of my data before sending it to Paneland billsum_support is the 4th from the last column; but, when I try to regress with billsum_support as the Y variable I get the following error.

reg=PanelOLS(y=df['billsum_support'],x=df[['years_exp', 'unemployment', 'dir_ind_expendituresfor']],time_effects=True)
reg
KeyError                                  Traceback (most recent call last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   1875             try:
-> 1876                 return self._engine.get_loc(key)
   1877             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4027)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3891)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12408)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12359)()

KeyError: 'billsum_support'

我见过工作示例这里但是这个人似乎有他们的数据是堆叠格式而不是Panel。
是否有人对OLS Panel有一定的经验并能理解我在这里做错了什么?

I have seen the working example here but this person seems to have their data in stacked format instead of Panel. Is there someone that has some experience with OLS Panel and can understand what I am doing wrong here?

推荐答案

我得到它了;跟进 ptrj ,以及做一些简单的探索我找到了解决方案,并将其发布在问题中

I got it; following up on ptrj, and doing some simple exploring I found the solution and will post it in the question

df=df.pivot_table(index='date',columns='cid', fill_value=0,aggfunc=np.mean)

df=df.T.to_panel()

df=df.transpose(2,1,0)

df=df.to_frame()

这篇关于 pandas :麻烦实施Panel OLS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆