消息“忽略异常”;处理pandas.datetime类型时 [英] Message "Exception ignored" when dealing pandas.datetime type

查看:399
本文介绍了消息“忽略异常”;处理pandas.datetime类型时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个xlsx文件,其列包含日期的格式为: 01.01.1900 09:01:25。该文件受密码保护,因此我通过win32com.client库将其转换为数据框。

I have a xlsx file with a column containing Dates in the format: "01.01.1900 09:01:25". The file is password protected so I convert it to a dataframe by means of win32com.client library.

这里是代码:

import pandas as pd
import win32com.client

xlApp = win32com.client.Dispatch("Excel.Application")
xlApp.DisplayAlerts = False
xlwb = xlApp.Workbooks.Open(File, False, True, None, " ") #Open Workbook password " "
xlws = xlwb.Sheets("Sheet 1") #Open Sheet 1        

#Get table dimensions 
LastRow = xlws.Range("A1").CurrentRegion.Rows.Count
LastColumn = xlws.Range("A1").CurrentRegion.Columns.Count
header=list((xlws.Range(xlws.Cells(1, 1), xlws.Cells(1, LastColumn)).Value)[0])
content = list(xlws.Range(xlws.Cells(2, 1), xlws.Cells(LastRow, LastColumn)).Value)
#Get the dataframe
df=pd.DataFrame(data=content, columns=header)
print (df)

我检查过一次导入的dtype是否自动并正确分配给datetime该列为64。问题是,无论何时我尝试使用该列的任何值做任何事情(只需打印或比较它),我都会收到一条消息:

I checked that once imported dtype as been automatically and correctly assigned to datetime64 for that column. The issue is that any time I try to do whatever with any value of that column (just print it or compare it) I get a meesage saying:

  File "pandas\_libs\tslibs\timezones.pyx", line 227, in pandas._libs.tslibs.timezones.get_dst_info

AttributeError: 'NoneType' object has no attribute 'total_seconds'

Exception ignored in: 'pandas._libs.tslib._localize_tso'
Traceback (most recent call last):
  File "pandas\_libs\tslibs\timezones.pyx", line 227, in pandas._libs.tslibs.timezones.get_dst_info
AttributeError: 'NoneType' object has no attribute 'total_seconds'
Traceback (most recent call last):

尽管如此,代码仍然可以正常工作,但是警告消息却让我很烦。

Nonetheless the code works perfectly, but the warning message is annoying me.

我可以使用数据类型来避免该警告吗?

Is there anything I can do with the datatype to avoid that warning?

推荐答案

在此打开excel content 变量是一个元组列表。

Opening the excel in this way, the content variable is a list of tuples.

看看这些元组,有一个TimeZoneInfo可以将所有日期本地化为一种时区,在我的例子中是 GMT标准时间。

Having a look on those tuples there is a TimeZoneInfo that localizes all the dates in a kind of time zone, in my case "GMT Standard Time".

因此,一旦转换为数据帧,当执行 df.dtypes 时,结果不仅是 datetime64,而且是 datetime64(UTC + 0:00)都柏林,爱丁堡,...

So once converted to a dataframe, when doing df.dtypes the result is not only "datetime64" but "datetime64 (UTC+0:00) Dublin, Edimburg, ..."

此时区设置仅在通过 win32com.client 。如果删除了密码,则可以使用 pandas.read_excel 打开密码,然后发现没有为这些日期时间设置时区,并且不会出现上述警告。

This time zone setting only happens when opening the excel file through win32com.client. If you removed the password, you can open it with pandas.read_excel and discover that there is no timezone set for those datetimes and the mentioned warning does not appear.

不确切知道它发生的原因,但是对于原始示例,我有一个解决方案。该警告消失,将被tz数据库识别的时区设置为 UTC 或简称为 None 。像这样的东西:

Don't know exactly the reason it happens, but I have a solution for the original example. The warning dissapears setting a timezone recognized by tz database as "UTC" or simply None. Something like:

df["col_name"]=df["col_name"].dt.tz_convert(None)

这篇关于消息“忽略异常”;处理pandas.datetime类型时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆