将多种日期格式解析为一种格式 [英] Parse multiple date formats into a single format
问题描述
我有一列称为发布(日期).如您所见,它具有多种日期格式以及nan值.我想跳过nan值,将所有其他格式都转换为%Y-%-%d,并忽略只有一年的格式.
I have one column called published (date). As you can see, it has multiple date formats and also nan values. I would like to skip nan values, convert all the other formats to %Y-%-%d, and ignore the one that has the only year.
我尝试了df ['publish_time'] = pd.to_datetime(df ['publish_time'])还有类似的东西:
I tried df['publish_time']=pd.to_datetime(df['publish_time']) and also things like:
fmt=['%Y-%m-%d', '%d-%m-%Y', '%d/%m/%Y',
'%Y-%d-%m', '%Y-%d-%b', '%d-%b-%Y', '%d/%b/%Y','Year: %d; month','month:
%d;Year','%Y','%b %d %Y','%b %Y %d']
但是我无法解决.有什么建议么?谢谢!
but I could not solve it. Any suggestions? Thanks!
这是该列:
published
2014 Jul 22
2003 Aug
2019 Nov 26
2012-12-07
2020 Jan 21
2015-01-01
2010-11-30
2007-05-10
2020
2012-02-29
2016 Apr 19
2006-12-31
2013 Jun 27
2019 Jun 19
2015 Jun 12
2006 Jun-Dec
2006-07-31
nan
2017-04-15
2016 May 22
2020 Feb
2017 May 6
2020 Mar 11
2013-04-30
2020-03-07
nan
2018
推荐答案
首先在 fmt
列表中添加了2种新格式:
First was added 2 new formats to fmt
list:
fmt=['%Y-%m-%d', '%d-%m-%Y', '%d/%m/%Y',
'%Y-%d-%m', '%Y-%d-%b', '%d-%b-%Y', '%d/%b/%Y','Year: %d; month',
'month: %d;Year','%Y','%b %d %Y','%b %Y %d',
'%Y %b %d', '%Y %b']
然后在列表理解中将列转换为日期时间,参数 errors ='coerce'
用于将不匹配的值转换为缺少的值.由 concat
最后加入一起.
Then in list comprehension convert column to datetimes, parameter errors='coerce'
is for non matched values to missing values. Last join together by concat
.
最后一次是因为每行可能有多个值,因为使用 dd/mm/YYYY
与 mm/dd/YYYY
格式(不确定一天中的月份)来回填选择第一列.这意味着将以高优先级选择列表中的第一格式.
Last because possible multiple values per rows because dd/mm/YYYY
vs mm/dd/YYYY
formats (not sure if month of day) is used back filling with select first column. It means what format is first in list it is selected with high priority.
dfs = [pd.to_datetime(df['publish_time'], format=f, errors='coerce') for f in fmt]
df['publish_time1']= pd.concat(dfs, axis=1).bfill(axis=1).iloc[:, 0]
print (df)
publish_time publish_time1
0 2014 Jul 22 2014-07-22
1 2003 Aug 2003-08-01
2 2019 Nov 26 2019-11-26
3 2012-12-07 2012-12-07
4 2020 Jan 21 2020-01-21
5 2015-01-01 2015-01-01
6 2010-11-30 2010-11-30
7 2007-05-10 2007-05-10
8 2020 2020-01-01
9 2012-02-29 2012-02-29
10 2016 Apr 19 2016-04-19
11 2006-12-31 2006-12-31
12 2013 Jun 27 2013-06-27
13 2019 Jun 19 2019-06-19
14 2015 Jun 12 2015-06-12
15 2006 Jun-Dec NaT
16 2006-07-31 2006-07-31
17 NaN NaT
18 2017-04-15 2017-04-15
19 2016 May 22 2016-05-22
20 2020 Feb 2020-02-01
21 2017 May 6 2017-05-06
22 2020 Mar 11 2020-03-11
23 2013-04-30 2013-04-30
24 2020-03-07 2020-03-07
25 NaN NaT
26 2018 2018-01-01
这篇关于将多种日期格式解析为一种格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!