缺少季度数据的 pandas 数据框中的滞后值和差异 [英] Lag values and differences in pandas dataframe with missing quarterly data

查看:104
本文介绍了缺少季度数据的 pandas 数据框中的滞后值和差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尽管Pandas具有时序功能,但我仍在努力处理时序数据不完整的数据框.

Though Pandas has time series functionality, I am still struggling with dataframes that have incomplete time series data.

参见下图,下图具有完整的数据,上图具有间隙.两张图片都显示正确的值.红色是我要使用黑色数据计算的列. Cumm_Issd列显示了当年累计发行的股票,MV是市值.

See the pictures below, the lower picture has complete data, the upper has gaps. Both pics show correct values. In red are the columns that I want to calculate using the data in black. Column Cumm_Issd shows the accumulated issued shares during the year, MV is market value.

我想计算每个季度的已发行股数(IssdQtr),市值的季度变化(D_MV_Q)和去年的MV(L_MV_Y).

I want to calculate the issued shares per quarter (IssdQtr), the quarterly change in Market Value (D_MV_Q) and the MV of last year (L_MV_Y).

请参见此链接以获取基本的cvs数据,以获取全部数据和此信息链接.有两个公司10201801020201.

See for underlying cvs data this link for the full data and this link for the gapped data. There are two firms 1020180 and 1020201.

但是,当我尝试使用熊猫shift方法时,如果存在间隙,它将失败,请尝试使用csv文件和下面的代码进行尝试.所有列(DiffEq, Dif1MV, Lag4MV)与IssdQtr, D_MV_Q, L_MV_Y有所不同-分别有几个季度.

However, when I try Pandas shift method it fails when there are gaps, try yourself using the csv files and the code below. All columns (DiffEq, Dif1MV, Lag4MV) differ - for some quarters - from IssdQtr, D_MV_Q, L_MV_Y, respectively.

有没有办法使用熊猫来处理数据中的空白?

Are there ways to deal with gaps in data using Pandas?

import pandas as pd
import numpy as np
import os

dfg = pd.read_csv('example_soverflow_gaps.csv',low_memory=False)
dfg['date']     = pd.to_datetime(dfg['Period'], format='%Y%m%d')
dfg['Q']        = pd.DatetimeIndex(dfg['date']).to_period('Q')
dfg['year']     = dfg['date'].dt.year

dfg['DiffEq']    = dfg.sort_values(['Q']).groupby(['Firm','year'])['Cumm_Issd'].diff()
dfg['Dif1MV']    = dfg.groupby(['Firm'])['MV'].diff(1)
dfg['Lag4MV']    = dfg.groupby(['Firm'])['MV'].shift(4)

差距数据:

完整数据:

推荐答案

通过合并解决了基本问题.首先,创建一个显示滞后日期或季度的变量.在这里,我们要获取去年的MV(返回4个季度):

Solved the basic problem by using a merge. First, create a variable that shows the lagged date or quarter. Here we want last year's MV (4 quarters back):

from pandas.tseries.offsets import QuarterEnd 
dfg['lagQ'] = dfg['date'] + QuarterEnd(-4)

然后使用键(固定和日期)和相关变量(此处为MV)创建一个数据框.

Then create a data-frame with the keys (Firm and date) and the relevant variable (here MV).

lagset=dfg[['Firm','date', 'MV']].copy()
lagset.rename(columns={'MV':'Lag_MV', 'date':'lagQ'}, inplace=True)

最后,将新框架合并到现有框架中:

Lastly, merge the new frame into the existing one:

dfg=pd.merge(dfg, lagset,  on=['Firm', 'lagQ'], how='left')

这篇关于缺少季度数据的 pandas 数据框中的滞后值和差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆