根据不同类型的值得出日期列 [英] Derive date column based on different types of values

查看：79 发布时间：2020/5/2 8:18:45 python pandas list dataframe python-datetime

本文介绍了根据不同类型的值得出日期列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个如下所示的数据框

I have a dataframe like as shown below

df = pd.DataFrame({'subject_id' :[1,2,3,4,5],
                        'date_of_interview':['2007-05-27','2008-03-13','2010-11-19','2011-10-05','2004-11-02'],
                        'Age':[31,35,78,72,43],
                        'value':[6,0.33,1990,np.nan,2001],
                        'age_detected':[25,35,98,65,40]})
df['date_of_interview'] = pd.to_datetime(df['date_of_interview'])

我想基于value和age_detected列创建一个名为dis_date的新列

I would like to create a new column called dis_date based on value and age_detected column

例如:subject_id = 1的date_of_interview为2007-05-27.如果我们查看他的值"列，我们可以看到他的值为6，这意味着我们必须从date_of_interview中减去6年才能将2001-05-27作为dis_date

Ex: subject_id = 1 has date_of_interview as 2007-05-27. If we look at his value column, we can see that he has a value of 6 which means we have to subtract 6 years from the date_of_interview to get 2001-05-27 as dis_date

如果您查看subject_id = 3，则他在值"列中有一个年份值，因此他的dis_date将为1990-11-19

Whereas if you look at subject_id = 3, he has a year value in value column, so his dis_date will be 1990-11-19

当value列中有NA时，我们必须查看他的age_detected列并将其从Age中减去以得出年数.

When there is NA in value column, we have to look at his age_detected column and subtract it from Age to get the number of years.

例如:subject_id = 4的Age为72，AGE_DETECTED为65.现在diff为7，而他的dis_date将为2004-10-05

Ex: subject_id = 4 has Age as 72 and AGE_DETECTED as 65. now the diff is 7 and his dis_date will be 2004-10-05

如果少于6位数字表示年份，请在值"列中注明值.如果为1，则表示减去1年.如果是0.33，则表示减去4个月. 1年= 12个月. 0.33 = 3.96个月(4个月)

Please note values in value column if less than 6 digits represent no of years. If it's 1, it means subtract 1 year. If it's 0.33 meaning subtract 4 months. 1 year = 12 months. 0.33 = 3.96 months (4 months)

我正在尝试类似的方法，但没有帮助

I was trying something like this but it doesn't help

for i in range(len(df['value'])):

    if (len(str(df['value'][i]))) < 6:
        df['dis_date'] = df['date_of_interview'] - pd.DateOffset(years=df['value'][i])

我希望我的输出如下所示

I expect my output to be like as shown below

推荐答案

在此解决方案中，将创建帮助器列以验证替换的年份或减去的月份:

In this solution are created helper columns for verify replaced years or subtracted months:

#if value less like 1 multiple by 12, another values set to NaNs
df['m1'] = np.where(df['value'].lt(1), df['value'].mul(12).round(), np.nan)
#if values more like 1000 it is year
df['y1'] = df['value'].where(df['value'].gt(1000))

#if values between 1, 1000 is necessary subtract years from value column
y1 = df['Age'].sub(df['age_detected'])
df['y2'] = np.where(y1.between(1, 1000), df['date_of_interview'].dt.year.sub(y1), np.nan)
#joined years to one column
df['y'] = df['y1'].fillna(df['y2'])

#replaced years by another column
f1 = lambda x: x['date_of_interview'] - pd.DateOffset(year=(int(x['y'])))
df['dis_date1'] = df.dropna(subset=['date_of_interview','y']).apply(f1, axis=1)
#subtracted months if non missing values
f1 = lambda x: x['date_of_interview'] - pd.DateOffset(months=(int(x['m1'])))
df['dis_date2'] = df.dropna(subset=['m1']).apply(f1, axis=1)

#join together
df['dis_date'] = df['dis_date1'].fillna(df['dis_date2'])
print (df)
   subject_id date_of_interview  Age    value  age_detected   m1      y1  \
0           1        2007-05-27   31     6.00            25  NaN     NaN   
1           2        2008-03-13   35     0.33            35  4.0     NaN   
2           3        2010-11-19   78  1990.00            98  NaN  1990.0   
3           4        2011-10-05   72      NaN            65  NaN     NaN   
4           5        2004-11-02   43  2001.00            40  NaN  2001.0   

       y2       y  dis_date1  dis_date2   dis_date  
0  2001.0  2001.0 2001-05-27        NaT 2001-05-27  
1     NaN     NaN        NaT 2007-11-13 2007-11-13  
2     NaN  1990.0 1990-11-19        NaT 1990-11-19  
3  2004.0  2004.0 2004-10-05        NaT 2004-10-05  
4  2001.0  2001.0 2001-11-02        NaT 2001-11-02

这篇关于根据不同类型的值得出日期列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据不同类型的值得出日期列 [英] Derive date column based on different types of values

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

根据不同类型的值得出日期列 [英] Derive date column based on different types of values

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭