将多种日期格式解析为一种格式 [英] Parse multiple date formats into a single format

查看:48
本文介绍了将多种日期格式解析为一种格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一列称为发布(日期).如您所见,它具有多种日期格式以及nan值.我想跳过nan值,将所有其他格式都转换为%Y-%-%d,并忽略只有一年的格式.

I have one column called published (date). As you can see, it has multiple date formats and also nan values. I would like to skip nan values, convert all the other formats to %Y-%-%d, and ignore the one that has the only year.

我尝试了df ['publish_time'] = pd.to_datetime(df ['publish_time'])还有类似的东西:

I tried df['publish_time']=pd.to_datetime(df['publish_time']) and also things like:

fmt=['%Y-%m-%d', '%d-%m-%Y', '%d/%m/%Y',
              '%Y-%d-%m', '%Y-%d-%b', '%d-%b-%Y', '%d/%b/%Y','Year: %d; month','month: 
               %d;Year','%Y','%b %d %Y','%b %Y %d']

但是我无法解决.有什么建议么?谢谢!

but I could not solve it. Any suggestions? Thanks!

这是该列:

published

2014 Jul 22
2003 Aug
2019 Nov 26
2012-12-07
2020 Jan 21
2015-01-01
2010-11-30
2007-05-10
2020
2012-02-29
2016 Apr 19
2006-12-31
2013 Jun 27
2019 Jun 19
2015 Jun 12
2006 Jun-Dec
2006-07-31
nan
2017-04-15
2016 May 22
2020 Feb
2017 May 6
2020 Mar 11
2013-04-30
2020-03-07
nan
2018

推荐答案

首先在 fmt 列表中添加了2种新格式:

First was added 2 new formats to fmt list:

fmt=['%Y-%m-%d', '%d-%m-%Y', '%d/%m/%Y',
     '%Y-%d-%m', '%Y-%d-%b', '%d-%b-%Y', '%d/%b/%Y','Year: %d; month',
     'month: %d;Year','%Y','%b %d %Y','%b %Y %d',
     '%Y %b %d', '%Y %b']

然后在列表理解中将列转换为日期时间,参数 errors ='coerce'用于将不匹配的值转换为缺少的值.由 concat 最后加入一起.

Then in list comprehension convert column to datetimes, parameter errors='coerce' is for non matched values to missing values. Last join together by concat.

最后一次是因为每行可能有多个值,因为使用 dd/mm/YYYY mm/dd/YYYY 格式(不确定一天中的月份)来回填选择第一列.这意味着将以高优先级选择列表中的第一格式.

Last because possible multiple values per rows because dd/mm/YYYY vs mm/dd/YYYY formats (not sure if month of day) is used back filling with select first column. It means what format is first in list it is selected with high priority.

dfs = [pd.to_datetime(df['publish_time'], format=f, errors='coerce') for f in fmt]
df['publish_time1']= pd.concat(dfs, axis=1).bfill(axis=1).iloc[:, 0]


print (df)
    publish_time publish_time1
0    2014 Jul 22    2014-07-22
1       2003 Aug    2003-08-01
2    2019 Nov 26    2019-11-26
3     2012-12-07    2012-12-07
4    2020 Jan 21    2020-01-21
5     2015-01-01    2015-01-01
6     2010-11-30    2010-11-30
7     2007-05-10    2007-05-10
8           2020    2020-01-01
9     2012-02-29    2012-02-29
10   2016 Apr 19    2016-04-19
11    2006-12-31    2006-12-31
12   2013 Jun 27    2013-06-27
13   2019 Jun 19    2019-06-19
14   2015 Jun 12    2015-06-12
15  2006 Jun-Dec           NaT
16    2006-07-31    2006-07-31
17           NaN           NaT
18    2017-04-15    2017-04-15
19   2016 May 22    2016-05-22
20      2020 Feb    2020-02-01
21    2017 May 6    2017-05-06
22   2020 Mar 11    2020-03-11
23    2013-04-30    2013-04-30
24    2020-03-07    2020-03-07
25           NaN           NaT
26          2018    2018-01-01

这篇关于将多种日期格式解析为一种格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆