基于索引给定条件在另一列上移动列的元素 [英] Shifting elements of column based on index given condition on another column

查看:166
本文介绍了基于索引给定条件在另一列上移动列的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据帧(df),其中有2列和1个索引。



索引是datetime索引,格式为2001-01-30 ....等等,索引按DATE排序,有数千个相同的日期(和是月度日期)。列A是公司名称(对应于日期),列B是索引中日期列A中公司名称的股价。



现在有每个日期的A列中有多个公司,而且公司随时间变化(因此数据不能完全预测)。



我想创建一个滞后于所有的列C在B转到下一个日期的价格(根据索引)。



基本的.shift()将无法正常工作,因为我会要求所有的价格将根据公司在指数下一个位置仍然存在的情况进行转移。



我想要一个将B向前移位1的列C,一列D将其重新移回1.



我已经坚持了一段时间,有人非常聪明,请帮忙。



谢谢

解决方案

考虑示例数据框 df 以下

  np.random.seed([3,1415])
df = pd.concat(dict(
A = pd.Series(np.random.rand(10),pd.date_range('2016-09-30',periods = 10)),
B = pd.Series(np.random.rand(7),pd.date_range('2016-09-25',periods = 7)),
C = pd.Series(np.random.rand(10) ,pd.date_range('2016-09-20',periods = 10)),
D = pd.Series(np.random.rand(8),pd.date_range('2016-10-30'期间= 8)),
E = pd.Series(np.random.rand(12),pd.date_range('2016-10-25',periods = 12)),
F = pd .Series(np.random.rand(14),pd.date_range('▲8-30',periods = 14)),

))。rename_axis(['ColumnA',None] ).reset_index('ColumnA',name ='ColumnB')

print(df.head(10))

ColumnA ColumnB
2016-09-30 A 0.444939
2016-10-01 A 0.407554
2016-10-02 A 0.460148
2016-10-03 A 0.465239
2016-10-04 A 0.462691
2016-10-05 A 0.016545
2016-10-06 A 0.850445
2016-10-07 A 0.8 17744
2016-10-08 A 0.777962
2016-10-09 A 0.757983



< hr>

使用 groupby + shift / em>

  d1 = df.set_index('ColumnA',append = True)
g = d1.groupby(level ='ColumnA')。ColumnB
keys = ['Forward','Back']
new_df = d1.join(pd.concat([g.shift(i)for i在[-1,1]],轴= 1,键=键))
print(new_df.query('ColumnA ==A')head(10))

列B前退
ColumnA
2016-09-30 A 0.444939 0.407554 NaN
2016-10-01 A 0.407554 0.460148 0.444939
2016-10-02 A 0.460148 0.465239 0.407554
2016-10-03 A 0.465239 0.462691 0.460148
2016-10-04 A 0.462691 0.016545 0.465239
2016-10-05 A 0.016545 0.850445 0.462691
2016-10-06 A 0.850445 0.817744 0.016545
2016 -10-07 A 0.817744 0.777962 0.850445
2016-10-08 A 0.777962 0.757983 0.817744
2016-10-09 A 0.757983 NaN 0.777962


I have a dataframe (df) with 2 columns and 1 index.

Index is datetime index and is in format of 2001-01-30 .... etc and the index is ordered by DATE and there are thousands of identical dates (and is monthly dates). Column A is company name (which corresponds to the date), Column B are share prices for the company names in column A for the date in the Index.

Now there are multiple companies in Column A for each date, and companies do vary over time (so the data is not predictable fully).

I want to create a Column C which lags all the prices which are in B forward to the next date (as per in the index).

A basic .shift() would not work, as I would require all the prices to be shifted based on the condition that the company is still there at the next point in the index.

I want a column C which shifts B forward by 1, and a column D which shifts it back by 1.

I have been stuck on this for a while, somebody very smart please help.

Thanks

解决方案

Consider the example dataframe df below

np.random.seed([3,1415])
df = pd.concat(dict(
        A=pd.Series(np.random.rand(10), pd.date_range('2016-09-30', periods=10)),
        B=pd.Series(np.random.rand(7), pd.date_range('2016-09-25', periods=7)),
        C=pd.Series(np.random.rand(10), pd.date_range('2016-09-20', periods=10)),
        D=pd.Series(np.random.rand(8), pd.date_range('2016-10-30', periods=8)),
        E=pd.Series(np.random.rand(12), pd.date_range('2016-10-25', periods=12)),
        F=pd.Series(np.random.rand(14), pd.date_range('2016-08-30', periods=14)),

    )).rename_axis(['ColumnA', None]).reset_index('ColumnA', name='ColumnB')

print(df.head(10))

           ColumnA   ColumnB
2016-09-30       A  0.444939
2016-10-01       A  0.407554
2016-10-02       A  0.460148
2016-10-03       A  0.465239
2016-10-04       A  0.462691
2016-10-05       A  0.016545
2016-10-06       A  0.850445
2016-10-07       A  0.817744
2016-10-08       A  0.777962
2016-10-09       A  0.757983


use groupby + shift

d1 = df.set_index('ColumnA', append=True)
g = d1.groupby(level='ColumnA').ColumnB
keys = ['Forward', 'Back']
new_df = d1.join(pd.concat([g.shift(i) for i in [-1, 1]], axis=1, keys=keys))
print(new_df.query('ColumnA == "A"').head(10))

                     ColumnB   Forward      Back
           ColumnA                              
2016-09-30 A        0.444939  0.407554       NaN
2016-10-01 A        0.407554  0.460148  0.444939
2016-10-02 A        0.460148  0.465239  0.407554
2016-10-03 A        0.465239  0.462691  0.460148
2016-10-04 A        0.462691  0.016545  0.465239
2016-10-05 A        0.016545  0.850445  0.462691
2016-10-06 A        0.850445  0.817744  0.016545
2016-10-07 A        0.817744  0.777962  0.850445
2016-10-08 A        0.777962  0.757983  0.817744
2016-10-09 A        0.757983       NaN  0.777962

这篇关于基于索引给定条件在另一列上移动列的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆