将日期时间转换为另一种格式而不更改dtype [英] Convert datetime to another format without changing dtype
问题描述
我只是自己学习熊猫,遇到了一些问题.
I'm just learning Pandas myself and I have met few problems.
-
在从csv文件读取的DataFrame中,我有一列包含日期数据,该日期数据具有不同的格式(例如
'%m/%d/%Y'
和'%Y-%m-%d'
,可能为空白.),并且我想统一此列的格式.但是我不知道是否还有其他格式.因此,当我使用pd.to_datetime()
时,它引发了一些错误,例如格式不匹配和数据不符合时间要求.如何统一此列的格式?
In a DataFrame, which it was reads from a csv file, I have one column includes date data that in different format(like
'%m/%d/%Y'
and'%Y-%m-%d'
, may be blank.) and I want to unify the format of this column. But I don't know if there are any other formats. So when I usingpd.to_datetime()
,it raised some errors like format not matching and not timelike data. How can I unify the format of this column?
我已将该列的一部分转换为datetime dtype,并且为YYYY-mm-dd
格式.我可以保留datetime dtype,并将格式更改为'%m/%d/%Y'
吗?我使用过pd.dt.strftime()
,它将更改格式,但还将dtype更改为str,而不保留datetime dtype.
I have converted part of that column into datetime dtype, and it's in YYYY-mm-dd
format. Can I keep the datetime dtype, and change the format into '%m/%d/%Y'
? I have used pd.dt.strftime()
, it will change the format, but also change the dtype into str, not keeping the datetime dtype.
推荐答案
所以当我使用pd.to_datetime()时,它引发了一些错误,例如格式不 匹配且不符合时间要求的数据.如何统一此格式 专栏?
So when I using pd.to_datetime(),it raised some errors like format not matching and not timelike data. How can I unify the format of this column?
使用errors='coerce'
选项以便为未转换的值返回NaT
(不是时间).另请注意,format
参数不是必需的.省略它会使熊猫尝试多种格式,否则它将恢复为NaT
1 .例如:
Use the errors='coerce'
option in order to return NaT
(Not a Time) for non-converted values. Also note that the format
argument is not required. Omitting it will enable Pandas to try multiple formats, failing which it will revert to NaT
1. For example:
df['datetime'] = pd.to_datetime(df['datetime'], errors='coerce')
当心,混合类型可能会被错误解释.例如,Python如何知道05/06/2018
是6月5日还是5月6日?将应用约定顺序,如果需要更好的控制,则需要自己应用自定义顺序.
Beware, mixed types may be interpreted incorrectly. For example, how will Python know whether 05/06/2018
is 5th June or 6th May? An order of conventions will be applied and if you need greater control you will need to apply a customised ordering yourself.
我可以保留datetime dtype,并将格式更改为'%m/%d/%Y'吗?
Can I keep the datetime dtype, and change the format into '%m/%d/%Y'?
不,您不能. datetime
系列在内部存储为整数.任何人类可读的日期表示形式都是表示形式,而不是基础整数.要访问自定义格式,可以使用Pandas中可用的方法.您甚至可以将这样的文本表示形式存储在pd.Series
变量中:
No, you cannot. datetime
series are stored internally as integers. Any human-readable date representation is just that, a representation, not the underlying integer. To access your custom formatting, you can use methods available in Pandas. You can even store such a text representation in a pd.Series
variable:
formatted_dates = df['datetime'].dt.strftime('%m/%d/%Y')
formatted_dates
的dtype
将是object
,这表明系列中的元素指向任意Python时间.在这种情况下,那些任意类型恰好都是字符串.
The dtype
of formatted_dates
will be object
, which indicates that the elements of your series point to arbitrary Python times. In this case, those arbitrary types happen to be all strings.
最后,我强烈建议您不要将datetime
系列转换为字符串,直到工作流程的最后一步.这是因为一旦这样做,您将不再能够在这样的系列上使用高效的矢量化运算.
Lastly, I strongly recommend you do not convert a datetime
series to strings until the very last step in your workflow. This is because as soon as you do so, you will no longer be able to use efficient, vectorised operations on such a series.
1 这将牺牲性能并与 dateutil
库,如
1 This will sacrifice performance and contrasts with datetime.strptime
, which requires format to be specified. Internally, Pandas uses the dateutil
library, as indicated in the docs.
这篇关于将日期时间转换为另一种格式而不更改dtype的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!