pandas :将数据框中的日期更改为相同的日期格式 [英] Pandas: Change dates in dataframe to same date format

查看:114
本文介绍了 pandas :将数据框中的日期更改为相同的日期格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中包含一个包含以下内容的列:

I have a dataframe that contains a column which holds:

Date:
31MAR2005
30-06-05
311205

我想将这些日期转换为格式:30-06-05(DD-MM-JJ).最简单的方法是什么?这些字段还不是日期格式,只有字符串.

I would like to convert these dates to the format : 30-06-05 (DD-MM-JJ). What is the simplest way to do this? The fields are not in a date format yet, only strings.

推荐答案

您可以使用

You could use Pandas' vectorize string methods to extract the day, month and year from each date string:

import pandas as pd

df = pd.DataFrame(['31MAR2005', '30-06-05', '311205'], columns=['Date'])
tmp = df['Date'].str.extract(r'(\d{2})-?(\D{3}|\d{2})-?.*(\d{2})')
tmp.columns = ['day', 'month', 'year']

收益

In [228]: tmp
Out[228]: 
  day month year
0  31   MAR   05
1  30    06   05
2  31    12   05

现在,您可以通过调用Series.map来将三个字母的月份缩写更改为数字字符串:

Now you can change 3-letter month abbreviations to numeric strings by calling Series.map:

import calendar
monthmap = {calendar.month_abbr[i].upper(): '{:02d}'.format(i) for i in range(1, 13)}
monthmap.update({'{:02d}'.format(i):'{:02d}'.format(i) for i in range(1, 13)})
tmp['month'] = tmp['month'].map(monthmap)

收益

In [230]: tmp
Out[230]: 
  day month year
0  31    03   05
1  30    06   05
2  31    12   05

最后,您可以将df['Date']重新分配为所需的日期字符串格式:

And finally, you can re-assign df['Date'] to the desired date-string format:

df['Date'] = tmp['day']+'-'+tmp['month']+'-'+tmp['year']

收益

In [232]: df
Out[232]: 
       Date
0  31-03-05
1  30-06-05
2  31-12-05

特别是如果DataFrame长, 使用向量化的字符串方法应该比使用df.apply为每个行值调用一次Python函数更快.

Especially if the DataFrame is long, using vectorized string methods should be faster than using df.apply to call a Python function once for every row value.

还请注意,这无需将字符串解析为即可完成任务 时间戳记.那可能是好事或坏事. 一方面,它可以提高性能.在 另一方面,它可能允许无效的日期字符串(例如'30FEB2005')通过.

Also note that this accomplishes the task without parsing the strings as Timestamps. That might be a good or a bad thing. On the one hand, it may improve performance. On the other hand, it may allow invalid date strings (such as '30FEB2005') to slip through.

重新格式化字符串后,您可以调用

After re-formatting the strings, you could call

df['Date'] = pd.to_datetime(df['Date'])

将日期字符串转换为正确的Timestamps.然后无效的日期字符串将成为NaT(非时间戳记)值.

to convert the date strings into proper Timestamps. Then invalid date strings would become NaT (Not-a-Timestamp) values.

这篇关于 pandas :将数据框中的日期更改为相同的日期格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆