python2.7 dataframe:从现有列值添加新列 [英] python2.7 dataframe: add new column from existed column value
问题描述
我有一个如下的数据框,只是一个例子。
I have a dataframe as following, just a example.
date y w diff
2010-1-1 3 1 3
2010-1-2 4 1 4
2010-1-3 5 1 2
2010-1-4 6 2 5
2010-1-5 7 2 6
2010-1-6 8 2 5
2010-1-7 9 3 2
2010-1-8 10 4 4
2010-1-9 11 5 5
2010-1-10 12 6 6
2010-1-11 13 5 6
现在,例如,我是数据帧的索引,我想为数据帧添加新列,有三个新的列名,如p1,p2,p3,但该值是前两个日期的值。当然,值p1,p2的前两行是Nan。在3-5行中,p1,p2的值均为3、4,而p3的值是前两行的最后diff的值,我的意思是从3-5行中,p3的值均为4。五行作为一个周期。我的意思是8-10行,p1,p2,p3的值分别是8、9、2。新的数据帧如下:
Now for example i is the index of dataframe, I want to add new column for the dataframe, there are three new column name is like, p1, p2, p3, but the value is value of previous two date. Of course, the previous two rows of values p1, p2 is Nan. From 3-5 rows, the value of p1, p2 all are 3, 4, and value of p3 is value of last diff of previous two rows, I mean from 3-5 rows the value of p3 all are 4. I use the five rows as a period. I mean the 8-10 rows, the value of p1, p2, p3 are 8, 9, 2. The new dataframe like as following:
date y w diff p1 p2 p3
2010-1-1 3 1 3 Nan Nan Nan
2010-1-2 4 1 4 Nan Nan Nan
2010-1-3 5 1 2 3 4 4
2010-1-4 6 2 5 3 4 4
2010-1-5 7 2 6 3 4 4
2010-1-6 8 2 5 Nan Nan Nan
2010-1-7 9 3 2 Nan Nan Nan
2010-1-8 10 4 4 8 9 2
2010-1-9 11 5 5 8 9 2
2010-1-10 12 6 6 8 9 2
2010-1-11 13 5 6 Nan Nan Nan
如果您对我的问题不了解,请对其进行评论。谢谢!
If there are something you don't understand my question, please comment it. thanks!
推荐答案
您可以使用 groupby
通过 array g
由 arange
创建,并使用 shift
,然后根据需要在numpy数组中设置值。最后由 join
:
You can use groupby
by array g
created by arange
and floor division with custom function with shift
and then set values in numpy array by requirements. Last add to original by join
:
df['date'] = pd.to_datetime(df['date'])
g = np.arange(len(df.index)) // 5
def f(x):
x = x.shift(2)
a = x.values
if a.shape[0] > 3:
a[3,1] = a[3, 0]
a[3,0] = a[2, 0]
a[2] = a[3]
a[4] = a[3]
return pd.DataFrame(a, index=x.index, columns=['p1','p2','p3'])
df1 = df.groupby(g)['y','w','diff'].apply(f)
print (df1)
p1 p2 p3
0 NaN NaN NaN
1 NaN NaN NaN
2 3.0 4.0 4.0
3 3.0 4.0 4.0
4 3.0 4.0 4.0
5 NaN NaN NaN
6 NaN NaN NaN
7 8.0 9.0 2.0
8 8.0 9.0 2.0
9 8.0 9.0 2.0
10 NaN NaN NaN
df2 = df.join(df1)
print (df2)
date y w diff p1 p2 p3
0 2010-01-01 3 1 3 NaN NaN NaN
1 2010-01-02 4 1 4 NaN NaN NaN
2 2010-01-03 5 1 2 3.0 4.0 4.0
3 2010-01-04 6 2 5 3.0 4.0 4.0
4 2010-01-05 7 2 6 3.0 4.0 4.0
5 2010-01-06 8 2 5 NaN NaN NaN
6 2010-01-07 9 3 2 NaN NaN NaN
7 2010-01-08 10 4 4 8.0 9.0 2.0
8 2010-01-09 11 5 5 8.0 9.0 2.0
9 2010-01-10 12 6 6 8.0 9.0 2.0
10 2010-01-11 13 5 6 NaN NaN NaN
这篇关于python2.7 dataframe:从现有列值添加新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!