Python面板数据 [英] Python Panel Data

查看:619
本文介绍了Python面板数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我通常使用Stata,但现在想使用Python并拼命尝试创建Pandel数据集.我尝试了pandas.panel,但无法正常工作. 我有以下数据集:

I am usually using Stata but now want to use Python and desperately trying to create a pandel data set. I tried pandas.panel but do not get it to work. I have the following dataset:

  date  id1   id2
  2000  100   50
  2001  101   48

现在我要使它看起来像这样:

Now I want to make it look like this:

    date  id   variable
    2000   1    100
    2000   2    101
    2001   1    50
    2001   2    48

接下来,我想确定一个时间和id变量来运行某些面板功能.我也试过了dataframe.stack(),但这并没有根据ID进行排序.我该怎么做?还是我在这里错过了熊猫中一些不错的时间序列功能?

Next, I want to identify a time and id variable to run some panel function. I also tried dataframe.stack(), but this doesn't sort according to the id. How do I do this or am I missing some nice time-series function in pandas here?

很抱歉.我相信这已经在某处得到了回答,但是我现在尝试了几个小时,无法解决.

Sorry for the question. I am sure this has been answered somewhere, but I tried several hours now and cannot figure it out.

推荐答案

给出输入数据:

data = [
    {"date": 2000, "id1": 100, "id2": 50},
    {"date": 2001, "id1": 101, "id2": 48}
]

data = {
    "date": [2000, 2001],
    "id1": [100, 101],
    "id2": [50, 48],
}

如此

df = pd.DataFrame(data)
df

"融化"的熊猫DataFrame:

"melt" the pandas DataFrame:

melted = pd.melt(df, id_vars="date", var_name="id", value_name="variable")

# Optional amendments
melted["id"] = melted["id"].str.replace("id", "")
melted.sort_values(by="date", inplace=True)
melted.reset_index(inplace=True, drop=True)

melted

melted输出

其他参考文献:Wickham,H. 整理数据 ,《统计软件杂志》,2014年10月59日.

Additional Reference: Wickham, H. Tidy Data, The Journal of Statistical Software, 10, 59, 2014.

这篇关于Python面板数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆