python2.7 dataframe:从现有列值添加新列 [英] python2.7 dataframe: add new column from existed column value

查看:72
本文介绍了python2.7 dataframe:从现有列值添加新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下的数据框,只是一个例子。

I have a dataframe as following, just a example.

date       y     w   diff
 2010-1-1   3     1    3
 2010-1-2   4     1    4
 2010-1-3   5     1    2
 2010-1-4   6     2    5
 2010-1-5   7     2    6
 2010-1-6   8     2    5
 2010-1-7   9     3    2
 2010-1-8   10    4    4
 2010-1-9   11    5    5
 2010-1-10  12    6    6
 2010-1-11  13    5    6

现在,例如,我是数据帧的索引,我想为数据帧添加新列,有三个新的列名,如p1,p2,p3,但该值是前两个日期的值。当然,值p1,p2的前两行是Nan。在3-5行中,p1,p2的值均为3、4,而p3的值是前两行的最后diff的值,我的意思是从3-5行中,p3的值均为4。五行作为一个周期。我的意思是8-10行,p1,p2,p3的值分别是8、9、2。新的数据帧如下:

Now for example i is the index of dataframe, I want to add new column for the dataframe, there are three new column name is like, p1, p2, p3, but the value is value of previous two date. Of course, the previous two rows of values p1, p2 is Nan. From 3-5 rows, the value of p1, p2 all are 3, 4, and value of p3 is value of last diff of previous two rows, I mean from 3-5 rows the value of p3 all are 4. I use the five rows as a period. I mean the 8-10 rows, the value of p1, p2, p3 are 8, 9, 2. The new dataframe like as following:

 date       y     w   diff  p1  p2  p3
 2010-1-1   3     1    3    Nan Nan Nan
 2010-1-2   4     1    4    Nan Nan Nan
 2010-1-3   5     1    2    3   4   4
 2010-1-4   6     2    5    3   4   4
 2010-1-5   7     2    6    3   4   4
 2010-1-6   8     2    5    Nan Nan Nan 
 2010-1-7   9     3    2    Nan Nan Nan
 2010-1-8   10    4    4    8   9    2
 2010-1-9   11    5    5    8   9    2
 2010-1-10  12    6    6    8   9    2
 2010-1-11  13    5    6    Nan Nan Nan

如果您对我的问题不了解,请对其进行评论。谢谢!

If there are something you don't understand my question, please comment it. thanks!

推荐答案

您可以使用 groupby 通过 array g arange 创建,并使用 shift ,然后根据需要在numpy数组中设置值。最后由 join

You can use groupby by array g created by arange and floor division with custom function with shift and then set values in numpy array by requirements. Last add to original by join:

df['date'] = pd.to_datetime(df['date'])
g = np.arange(len(df.index)) // 5

def f(x):
    x = x.shift(2)
    a = x.values
    if a.shape[0] > 3:
        a[3,1] = a[3, 0]
        a[3,0] = a[2, 0]
        a[2] = a[3]
        a[4] = a[3]
    return pd.DataFrame(a, index=x.index, columns=['p1','p2','p3'])


df1 = df.groupby(g)['y','w','diff'].apply(f)
print (df1)
     p1   p2   p3
0   NaN  NaN  NaN
1   NaN  NaN  NaN
2   3.0  4.0  4.0
3   3.0  4.0  4.0
4   3.0  4.0  4.0
5   NaN  NaN  NaN
6   NaN  NaN  NaN
7   8.0  9.0  2.0
8   8.0  9.0  2.0
9   8.0  9.0  2.0
10  NaN  NaN  NaN







df2 = df.join(df1)
print (df2)
         date   y  w  diff   p1   p2   p3
0  2010-01-01   3  1     3  NaN  NaN  NaN
1  2010-01-02   4  1     4  NaN  NaN  NaN
2  2010-01-03   5  1     2  3.0  4.0  4.0
3  2010-01-04   6  2     5  3.0  4.0  4.0
4  2010-01-05   7  2     6  3.0  4.0  4.0
5  2010-01-06   8  2     5  NaN  NaN  NaN
6  2010-01-07   9  3     2  NaN  NaN  NaN
7  2010-01-08  10  4     4  8.0  9.0  2.0
8  2010-01-09  11  5     5  8.0  9.0  2.0
9  2010-01-10  12  6     6  8.0  9.0  2.0
10 2010-01-11  13  5     6  NaN  NaN  NaN

这篇关于python2.7 dataframe:从现有列值添加新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆