pandas 从日期中获取年龄(例如:出生日期) [英] Pandas get the age from a date (example: date of birth)
本文介绍了 pandas 从日期中获取年龄(例如:出生日期)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何计算人的年龄(基于dob列),并使用新值将一列添加到数据框中?
How can I calculate the age of a person (based off the dob column) and add a column to the dataframe with the new value?
数据帧如下所示:
lname fname dob
0 DOE LAURIE 03011979
1 BOURNE JASON 06111978
2 GRINCH XMAS 12131988
3 DOE JOHN 11121986
我尝试执行以下操作:
now = datetime.now()
df1['age'] = now - df1['dob']
但是,收到以下错误:
TypeError:-:"datetime.datetime"和"str"的不受支持的操作数类型
TypeError: unsupported operand type(s) for -: 'datetime.datetime' and 'str'
推荐答案
import datetime as DT
import io
import numpy as np
import pandas as pd
pd.options.mode.chained_assignment = 'warn'
content = ''' ssno lname fname pos_title ser gender dob
0 23456789 PLILEY JODY BUDG ANAL 0560 F 031871
1 987654321 NOEL HEATHER PRTG SRVCS SPECLST 1654 F 120852
2 234567891 SONJU LAURIE SUPVY CONTR SPECLST 1102 F 010999
3 345678912 MANNING CYNTHIA SOC SCNTST 0101 F 081692
4 456789123 NAUERTZ ELIZABETH OFF AUTOMATION ASST 0326 F 031387'''
df = pd.read_csv(io.StringIO(content), sep='\s{2,}')
df['dob'] = df['dob'].apply('{:06}'.format)
now = pd.Timestamp('now')
df['dob'] = pd.to_datetime(df['dob'], format='%m%d%y') # 1
df['dob'] = df['dob'].where(df['dob'] < now, df['dob'] - np.timedelta64(100, 'Y')) # 2
df['age'] = (now - df['dob']).astype('<m8[Y]') # 3
print(df)
收益
ssno lname fname pos_title ser gender \
0 23456789 PLILEY JODY BUDG ANAL 560 F
1 987654321 NOEL HEATHER PRTG SRVCS SPECLST 1654 F
2 234567891 SONJU LAURIE SUPVY CONTR SPECLST 1102 F
3 345678912 MANNING CYNTHIA SOC SCNTST 101 F
4 456789123 NAUERTZ ELIZABETH OFF AUTOMATION ASST 326 F
dob age
0 1971-03-18 00:00:00 43
1 1952-12-08 18:00:00 61
2 1999-01-09 00:00:00 15
3 1992-08-16 00:00:00 22
4 1987-03-13 00:00:00 27
- 您的
dob
列当前似乎是字符串.第一的, 使用pd.to_datetime
将它们转换为Timestamps
. - 格式
'%m%d%y'
将最后两位数字转换为年份,但是 不幸的是,假设52
表示2052.因为那可能不是 希瑟·诺埃尔(Heather Noel)的出生年,我们从dob
减去100年 每当dob
大于now
时.您可能希望在条件df['dob'] < now
中减去now
几年,因为拥有101岁工人的可能性可能比拥有1岁工人的可能性更高. - 您可以从
now
中减去dob
以获得 timedelta64 [ns] .到 将其转换为年份,请使用astype('<m8[Y]')
或astype('timedelta64[Y]')
.
- It looks like your
dob
column are currently strings. First, convert them toTimestamps
usingpd.to_datetime
. - The format
'%m%d%y'
converts the last two digits to years, but unfortunately assumes52
means 2052. Since that's probably not Heather Noel's birthyear, let's subtract 100 years fromdob
whenever thedob
is greater thannow
. You may want to subtract a few years tonow
in the conditiondf['dob'] < now
since it may be slightly more likely to have a 101 year old worker than a 1 year old worker... - You can subtract
dob
fromnow
to obtain timedelta64[ns]. To convert that to years, useastype('<m8[Y]')
orastype('timedelta64[Y]')
.
这篇关于 pandas 从日期中获取年龄(例如:出生日期)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文