pandas 从日期中获取年龄(例如:出生日期) [英] Pandas get the age from a date (example: date of birth)

查看:279
本文介绍了 pandas 从日期中获取年龄(例如:出生日期)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何计算人的年龄(基于dob列),并使用新值将一列添加到数据框中?

How can I calculate the age of a person (based off the dob column) and add a column to the dataframe with the new value?

数据帧如下所示:

    lname      fname     dob
0    DOE       LAURIE    03011979
1    BOURNE    JASON     06111978
2    GRINCH    XMAS      12131988
3    DOE       JOHN      11121986

我尝试执行以下操作:

now = datetime.now()
df1['age'] = now - df1['dob']

但是,收到以下错误:

TypeError:-:"datetime.datetime"和"str"的不受支持的操作数类型

TypeError: unsupported operand type(s) for -: 'datetime.datetime' and 'str'

推荐答案

import datetime as DT
import io
import numpy as np
import pandas as pd

pd.options.mode.chained_assignment = 'warn'

content = '''     ssno        lname         fname    pos_title             ser  gender  dob 
0    23456789    PLILEY     JODY        BUDG ANAL             0560  F      031871 
1    987654321   NOEL       HEATHER     PRTG SRVCS SPECLST    1654  F      120852
2    234567891   SONJU      LAURIE      SUPVY CONTR SPECLST   1102  F      010999
3    345678912   MANNING    CYNTHIA     SOC SCNTST            0101  F      081692
4    456789123   NAUERTZ    ELIZABETH   OFF AUTOMATION ASST   0326  F      031387'''

df = pd.read_csv(io.StringIO(content), sep='\s{2,}')
df['dob'] = df['dob'].apply('{:06}'.format)

now = pd.Timestamp('now')
df['dob'] = pd.to_datetime(df['dob'], format='%m%d%y')    # 1
df['dob'] = df['dob'].where(df['dob'] < now, df['dob'] -  np.timedelta64(100, 'Y'))   # 2
df['age'] = (now - df['dob']).astype('<m8[Y]')    # 3
print(df)

收益

        ssno    lname      fname            pos_title   ser gender  \
0   23456789   PLILEY       JODY            BUDG ANAL   560      F   
1  987654321     NOEL    HEATHER   PRTG SRVCS SPECLST  1654      F   
2  234567891    SONJU     LAURIE  SUPVY CONTR SPECLST  1102      F   
3  345678912  MANNING    CYNTHIA           SOC SCNTST   101      F   
4  456789123  NAUERTZ  ELIZABETH  OFF AUTOMATION ASST   326      F   

                  dob  age  
0 1971-03-18 00:00:00   43  
1 1952-12-08 18:00:00   61  
2 1999-01-09 00:00:00   15  
3 1992-08-16 00:00:00   22  
4 1987-03-13 00:00:00   27  


  1. 您的dob列当前似乎是字符串.第一的, 使用pd.to_datetime将它们转换为Timestamps.
  2. 格式'%m%d%y'将最后两位数字转换为年份,但是 不幸的是,假设52表示2052.因为那可能不是 希瑟·诺埃尔(Heather Noel)的出生年,我们从dob减去100年 每当dob大于now时.您可能希望在条件df['dob'] < now中减去now几年,因为拥有101岁工人的可能性可能比拥有1岁工人的可能性更高.
  3. 您可以从now中减去dob以获得 timedelta64 [ns] .到 将其转换为年份,请使用astype('<m8[Y]')astype('timedelta64[Y]').
  1. It looks like your dob column are currently strings. First, convert them to Timestamps using pd.to_datetime.
  2. The format '%m%d%y' converts the last two digits to years, but unfortunately assumes 52 means 2052. Since that's probably not Heather Noel's birthyear, let's subtract 100 years from dob whenever the dob is greater than now. You may want to subtract a few years to now in the condition df['dob'] < now since it may be slightly more likely to have a 101 year old worker than a 1 year old worker...
  3. You can subtractdob from now to obtain timedelta64[ns]. To convert that to years, use astype('<m8[Y]') or astype('timedelta64[Y]').

这篇关于 pandas 从日期中获取年龄(例如:出生日期)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆