pandas :将数据框中的日期更改为相同的日期格式 [英] Pandas: Change dates in dataframe to same date format
问题描述
我有一个数据框,其中包含一个包含以下内容的列:
I have a dataframe that contains a column which holds:
Date:
31MAR2005
30-06-05
311205
我想将这些日期转换为格式:30-06-05(DD-MM-JJ).最简单的方法是什么?这些字段还不是日期格式,只有字符串.
I would like to convert these dates to the format : 30-06-05 (DD-MM-JJ). What is the simplest way to do this? The fields are not in a date format yet, only strings.
推荐答案
You could use Pandas' vectorize string methods to extract the day, month and year from each date string:
import pandas as pd
df = pd.DataFrame(['31MAR2005', '30-06-05', '311205'], columns=['Date'])
tmp = df['Date'].str.extract(r'(\d{2})-?(\D{3}|\d{2})-?.*(\d{2})')
tmp.columns = ['day', 'month', 'year']
收益
In [228]: tmp
Out[228]:
day month year
0 31 MAR 05
1 30 06 05
2 31 12 05
现在,您可以通过调用Series.map
来将三个字母的月份缩写更改为数字字符串:
Now you can change 3-letter month abbreviations to numeric strings by calling Series.map
:
import calendar
monthmap = {calendar.month_abbr[i].upper(): '{:02d}'.format(i) for i in range(1, 13)}
monthmap.update({'{:02d}'.format(i):'{:02d}'.format(i) for i in range(1, 13)})
tmp['month'] = tmp['month'].map(monthmap)
收益
In [230]: tmp
Out[230]:
day month year
0 31 03 05
1 30 06 05
2 31 12 05
最后,您可以将df['Date']
重新分配为所需的日期字符串格式:
And finally, you can re-assign df['Date']
to the desired date-string format:
df['Date'] = tmp['day']+'-'+tmp['month']+'-'+tmp['year']
收益
In [232]: df
Out[232]:
Date
0 31-03-05
1 30-06-05
2 31-12-05
特别是如果DataFrame长,
使用向量化的字符串方法应该比使用df.apply
为每个行值调用一次Python函数更快.
Especially if the DataFrame is long,
using vectorized string methods should be faster than using df.apply
to call a Python function once for every row value.
还请注意,这无需将字符串解析为即可完成任务
时间戳记.那可能是好事或坏事.
一方面,它可以提高性能.在
另一方面,它可能允许无效的日期字符串(例如'30FEB2005'
)通过.
Also note that this accomplishes the task without parsing the strings as
Timestamps. That might be a good or a bad thing.
On the one hand, it may improve performance. On
the other hand, it may allow invalid date strings (such as '30FEB2005'
) to slip through.
重新格式化字符串后,您可以调用
After re-formatting the strings, you could call
df['Date'] = pd.to_datetime(df['Date'])
将日期字符串转换为正确的Timestamps
.然后无效的日期字符串将成为NaT
(非时间戳记)值.
to convert the date strings into proper Timestamps
. Then invalid date strings would become NaT
(Not-a-Timestamp) values.
这篇关于 pandas :将数据框中的日期更改为相同的日期格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!