处理具有混合日期格式的Pandas数据框列 [英] Handling Pandas dataframe columns with mixed date formats

查看:153
本文介绍了处理具有混合日期格式的Pandas数据框列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经导入了一个具有混合数据格式的CSV文件-一些read_csv可以识别的日期格式,以及一些Excel序列日期时间格式(例如41,866.321).

I am have imported a CSV file which has mixed data formats - some date formats recognized by read_csv, plus some Excel serial-datetime format (eg 41,866.321).

导入数据后,列类型显示为对象(给定不同类型的数据),日期(两种类型的格式)都具有dtype字符串.

Once the data is imported, the column type is shown as object (given the different types of data) and the dates (both types of formats) have dtype string.

我想使用to_datetime方法将可识别的字符串日期格式转换为dataframe列中的日期时间,将无法识别的字符串保留为excel格式,然后我可以隔离并更正离线.但是除非我逐行应用方法(速度太慢),否则它将无法执行此操作.

I would like to use the to_datetime method to convert the recognized string date formats into datetimes in the dataframe column, leaving the unrecognized strings in excel format which I can then isolate and correct off line. But unless I apply the method row by row (way too slow), it fails to do this.

有人能解决这个问题吗?

Does anyone have a cleverer way of solving this?

更新:进行了一些修改,我找到了这个解决方案,使用coerce = True强制进行列数据类型转换,然后确定可以交叉引用回原始文件的空值.但是,如果有更好的方法(例如,将无法识别的时间戳固定到位),请告诉我.

Update: having tinkered around some more I have found this solution, using coerce = True to force the column datatype conversion, and then identifying null values which I can cross reference back to the original file. But if there is a better way to do this (eg fixing the unrecognized time stamps in place) please let me know.

df1['DateTime']=pd.to_datetime(df1['Time_Date'],coerce=True)
nulls=df1['Time_Date'][df1['Time_Date'].notnull()==False]

推荐答案

我已经找到了解决方案,使用coerce = True强制进行列数据类型转换,然后确定可以交叉引用的空值到原始文件.但是,如果有更好的方法(例如,将无法识别的时间戳固定到位),请告诉我.

Having tinkered around some more I have found this solution, using coerce = True to force the column datatype conversion, and then identifying null values which I can cross reference back to the original file. But if there is a better way to do this (eg fixing the unrecognized time stamps in place) please let me know.

df1['DateTime']=pd.to_datetime(df1['Time_Date'], errors='coerce')
nulls=df1['Time_Date'][df1['Time_Date'].notnull()==False]

这篇关于处理具有混合日期格式的Pandas数据框列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆