通过在特定日期范围之间插入其他列来在数据框中创建新列-Pandas [英] Create new column in data frame by interpolating other column in between a particular date range - Pandas

查看:75
本文介绍了通过在特定日期范围之间插入其他列来在数据框中创建新列-Pandas的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个df,如下所示。

I have a df as shown below.

数据是这样的。

     Date        y
0   2020-06-14  127
1   2020-06-15  216
2   2020-06-16  4
3   2020-06-17  90
4   2020-06-18  82
5   2020-06-19  70
6   2020-06-20  59
7   2020-06-21  48
8   2020-06-22  23
9   2020-06-23  25
10  2020-06-24  24
11  2020-06-25  22
12  2020-06-26  19
13  2020-06-27  10
14  2020-06-28  18
15  2020-06-29  157
16  2020-06-30  16
17  2020-07-01  14
18  2020-07-02  343

用于创建数据框的代码。

The code to create the data frame.

# Create a dummy dataframe
import pandas as pd
import numpy as np
y0 = [127,216,4,90, 82,70,59,48,23,25,24,22,19,10,18,157,16,14,343]
def initial_forecast(data):
    data['y'] = y0
    return data
# Initial date dataframe
df_dummy = pd.DataFrame({'Date': pd.date_range('2020-06-14', periods=19, freq='1D')})
# Dates
start_date = df_dummy.Date.iloc[1]
print(start_date)
end_date = df_dummy.Date.iloc[17]
print(end_date)
# Adding y0 in the dataframe
df_dummy = initial_forecast(df_dummy)
df_dummy

从上面我想在特定日期范围内插值数据。

From the above I would like to interpolate the data for a particular date range.

我想在2020-06-17至2020-06-27之间插值(线性)。

I would like to interpolate(linear) between 2020-06-17 to 2020-06-27.

ie 从2020-06-17到2020-06-27, y值从90变为10,分10步。因此平均每步减少8。

ie from 2020-06-17 to 2020-06-27 'y' values changes from 90 to 10 in 10 steps. so at an average in each step it reduces 8.

ie (90-10)/10(number of steps) = 8 in each steps

预期输出:

     Date        y       y_new
0   2020-06-14  127      127
1   2020-06-15  216      216
2   2020-06-16  4        4
3   2020-06-17  90       90
4   2020-06-18  82       82
5   2020-06-19  70       74
6   2020-06-20  59       66
7   2020-06-21  48       58
8   2020-06-22  23       50
9   2020-06-23  25       42
10  2020-06-24  24       34
11  2020-06-25  22       26
12  2020-06-26  19       18  
13  2020-06-27  10       10
14  2020-06-28  18       18
15  2020-06-29  157      157
16  2020-06-30  16       16
17  2020-07-01  14       14
18  2020-07-02  343      343

注意:在剩余的日期范围内,y_new值应与y值相同。

Note: In the remaining date range y_new value should be same as y value.

我尝试了以下代码,但未给出d所需的输出

I tried below code, that is not giving desired output

# Function
def df_interpolate(df, start_date, end_date): 
    df["Date"]=pd.to_datetime(df["Date"])
    df.loc[(df['Date'] >= start_date) & (df['Date'] <= end_date), 'y_new'] = np.nan
    df['y_new'] = df['y'].interpolate().round()
    return df
df1 = df_interpolate(df_dummy, '2020-06-17', '2020-06-27')


推荐答案

通过一些功能上的调整,它可以正常工作。 np.where 创建新列,从条件中删除 = 并强制转换为 int 根据您的预期输出。

With some tweaks to your function it works. np.where to create the new column, removing the = from your conditionals, and casting to int as per your expected output.

def df_interpolate(df, start_date, end_date): 
    df["Date"] = pd.to_datetime(df["Date"])
    df['y_new'] = np.where((df['Date'] > start_date) & (df['Date'] < end_date), np.nan, df['y'])
    df['y_new'] = df['y_new'].interpolate().round().astype(int)
    return df

         Date    y  y_new
0  2020-06-14  127    127
1  2020-06-15  216    216
2  2020-06-16    4      4
3  2020-06-17   90     90
4  2020-06-18   82     82
5  2020-06-19   70     74
6  2020-06-20   59     66
7  2020-06-21   48     58
8  2020-06-22   23     50
9  2020-06-23   25     42
10 2020-06-24   24     34
11 2020-06-25   22     26
12 2020-06-26   19     18
13 2020-06-27   10     10
14 2020-06-28   18     18
15 2020-06-29  157    157
16 2020-06-30   16     16
17 2020-07-01   14     14
18 2020-07-02  343    343

这篇关于通过在特定日期范围之间插入其他列来在数据框中创建新列-Pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆