通过在特定日期范围之间插入其他列来在数据框中创建新列-Pandas [英] Create new column in data frame by interpolating other column in between a particular date range - Pandas
问题描述
我有一个df,如下所示。
I have a df as shown below.
数据是这样的。
Date y
0 2020-06-14 127
1 2020-06-15 216
2 2020-06-16 4
3 2020-06-17 90
4 2020-06-18 82
5 2020-06-19 70
6 2020-06-20 59
7 2020-06-21 48
8 2020-06-22 23
9 2020-06-23 25
10 2020-06-24 24
11 2020-06-25 22
12 2020-06-26 19
13 2020-06-27 10
14 2020-06-28 18
15 2020-06-29 157
16 2020-06-30 16
17 2020-07-01 14
18 2020-07-02 343
用于创建数据框的代码。
The code to create the data frame.
# Create a dummy dataframe
import pandas as pd
import numpy as np
y0 = [127,216,4,90, 82,70,59,48,23,25,24,22,19,10,18,157,16,14,343]
def initial_forecast(data):
data['y'] = y0
return data
# Initial date dataframe
df_dummy = pd.DataFrame({'Date': pd.date_range('2020-06-14', periods=19, freq='1D')})
# Dates
start_date = df_dummy.Date.iloc[1]
print(start_date)
end_date = df_dummy.Date.iloc[17]
print(end_date)
# Adding y0 in the dataframe
df_dummy = initial_forecast(df_dummy)
df_dummy
从上面我想在特定日期范围内插值数据。
From the above I would like to interpolate the data for a particular date range.
我想在2020-06-17至2020-06-27之间插值(线性)。
I would like to interpolate(linear) between 2020-06-17 to 2020-06-27.
ie 从2020-06-17到2020-06-27, y值从90变为10,分10步
。因此平均每步减少8。
ie from 2020-06-17 to 2020-06-27 'y' values changes from 90 to 10 in 10 steps
. so at an average in each step it reduces 8.
ie (90-10)/10(number of steps) = 8 in each steps
预期输出:
Date y y_new
0 2020-06-14 127 127
1 2020-06-15 216 216
2 2020-06-16 4 4
3 2020-06-17 90 90
4 2020-06-18 82 82
5 2020-06-19 70 74
6 2020-06-20 59 66
7 2020-06-21 48 58
8 2020-06-22 23 50
9 2020-06-23 25 42
10 2020-06-24 24 34
11 2020-06-25 22 26
12 2020-06-26 19 18
13 2020-06-27 10 10
14 2020-06-28 18 18
15 2020-06-29 157 157
16 2020-06-30 16 16
17 2020-07-01 14 14
18 2020-07-02 343 343
注意:在剩余的日期范围内,y_new值应与y值相同。
Note: In the remaining date range y_new value should be same as y value.
我尝试了以下代码,但未给出d所需的输出
I tried below code, that is not giving desired output
# Function
def df_interpolate(df, start_date, end_date):
df["Date"]=pd.to_datetime(df["Date"])
df.loc[(df['Date'] >= start_date) & (df['Date'] <= end_date), 'y_new'] = np.nan
df['y_new'] = df['y'].interpolate().round()
return df
df1 = df_interpolate(df_dummy, '2020-06-17', '2020-06-27')
推荐答案
通过一些功能上的调整,它可以正常工作。 np.where
创建新列,从条件中删除 =
并强制转换为 int
根据您的预期输出。
With some tweaks to your function it works. np.where
to create the new column, removing the =
from your conditionals, and casting to int
as per your expected output.
def df_interpolate(df, start_date, end_date):
df["Date"] = pd.to_datetime(df["Date"])
df['y_new'] = np.where((df['Date'] > start_date) & (df['Date'] < end_date), np.nan, df['y'])
df['y_new'] = df['y_new'].interpolate().round().astype(int)
return df
Date y y_new
0 2020-06-14 127 127
1 2020-06-15 216 216
2 2020-06-16 4 4
3 2020-06-17 90 90
4 2020-06-18 82 82
5 2020-06-19 70 74
6 2020-06-20 59 66
7 2020-06-21 48 58
8 2020-06-22 23 50
9 2020-06-23 25 42
10 2020-06-24 24 34
11 2020-06-25 22 26
12 2020-06-26 19 18
13 2020-06-27 10 10
14 2020-06-28 18 18
15 2020-06-29 157 157
16 2020-06-30 16 16
17 2020-07-01 14 14
18 2020-07-02 343 343
这篇关于通过在特定日期范围之间插入其他列来在数据框中创建新列-Pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!