从pandas的字符串datetime列中提取日期 [英] Extract date from string datetime column in pandas
问题描述
我在pandas数据框中有一个Cash_date列,它是一个对象。我无法在此处使用pandas to_datetime函数。我的数据框的形状是(47654566,5)。我的数据框看起来像是
I have a column cash_date in pandas dataframe which is a object. I am not able to use pandas to_datetime function here. Shape of my data frame is (47654566,5).My data frame looks like
cash_date amount id
02-JAN-13 12.00.00.000000000 AM 100 1
13-FEB-13 12.00.00.000000000 AM 200 2
09-MAR-13 12.00.00.000000000 AM 300 3
03-APR-13 12.00.00.000000000 AM 400 4
02-JAN-13 06.26.02.438000000 PM 500 7
17-NOV-18 08.31.47.443000000 PM 700 8
我尝试了以下方式-
df.cash_date = pd.to_datetime(df['cash_date'], errors='coerce') # Not working
for i in range(len(df)):
df.cash_date = df.cash_date.astype(str).str.split('\d\d.\d\d.\d\d.\d\d\d\d\d\d\d\d\d')[i][0] # Not working
我希望数据帧看起来像s-
I want the data frame looks like s-
cash_date amount id date
02-JAN-13 12.00.00.000000000 AM 100 1 02-JAN-13
13-FEB-13 12.00.00.000000000 AM 200 2 13-FEB-13
09-MAR-13 12.00.00.000000000 AM 300 3 09-MAR-13
03-APR-13 12.00.00.000000000 AM 400 4 03-APR-13
02-JAN-13 06.26.02.438000000 PM 500 7 02-JAN-13
17-NOV-18 08.31.47.443000000 PM 700 8 17-NOV-18
推荐答案
指定 format = ...
自变量。
pd.to_datetime(df['cash_date'], format='%d-%b-%y %H.%M.%S.%f %p', errors='coerce')
0 2013-01-02 12:00:00.000
1 2013-02-13 12:00:00.000
2 2013-03-09 12:00:00.000
3 2013-04-03 12:00:00.000
4 2013-01-02 06:26:02.438
5 2018-11-17 08:31:47.443
Name: cash_date, dtype: datetime64[ns]
详细信息ab可以在 http://strftime.org 中找到可接受的格式。
Details about acceptable formats may be found at http://strftime.org.
从这里,您可以使用 dt.floor
设置日期时间
:
From here, you can floor the datetimes
using dt.floor
:
df['date'] = pd.to_datetime(
df['cash_date'], format='%d-%b-%y %H.%M.%S.%f %p', errors='coerce'
).dt.floor('D')
df
cash_date amount id date
0 02-JAN-13 12.00.00.000000000 AM 100 1 2013-01-02
1 13-FEB-13 12.00.00.000000000 AM 200 2 2013-02-13
2 09-MAR-13 12.00.00.000000000 AM 300 3 2013-03-09
3 03-APR-13 12.00.00.000000000 AM 400 4 2013-04-03
4 02-JAN-13 06.26.02.438000000 PM 500 7 2013-01-02
5 17-NOV-18 08.31.47.443000000 PM 700 8 2018-11-17
OTOH,如果您要提取日期成分而不解析日期,则有两种选择:
OTOH, if you are looking to extract the date component without parsing the date, there are a couple of options:
str.split
df['date'] = df['cash_date'].str.split(n=1).str[0]
df
cash_date amount id date
0 02-JAN-13 12.00.00.000000000 AM 100 1 02-JAN-13
1 13-FEB-13 12.00.00.000000000 AM 200 2 13-FEB-13
2 09-MAR-13 12.00.00.000000000 AM 300 3 09-MAR-13
3 03-APR-13 12.00.00.000000000 AM 400 4 03-APR-13
4 02-JAN-13 06.26.02.438000000 PM 500 7 02-JAN-13
5 17-NOV-18 08.31.47.443000000 PM 700 8 17-NOV-18
或者,使用列表理解。
df['date'] = [x.split(None, 1)[0] for x in df['cash_date']]
df
cash_date amount id date
0 02-JAN-13 12.00.00.000000000 AM 100 1 02-JAN-13
1 13-FEB-13 12.00.00.000000000 AM 200 2 13-FEB-13
2 09-MAR-13 12.00.00.000000000 AM 300 3 09-MAR-13
3 03-APR-13 12.00.00.000000000 AM 400 4 03-APR-13
4 02-JAN-13 06.26.02.438000000 PM 500 7 02-JAN-13
5 17-NOV-18 08.31.47.443000000 PM 700 8 17-NOV-18
我会下注这是两个选择中最快的一个。。
这篇关于从pandas的字符串datetime列中提取日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!