消息“忽略异常”;处理pandas.datetime类型时 [英] Message "Exception ignored" when dealing pandas.datetime type
问题描述
我有一个xlsx文件,其列包含日期的格式为: 01.01.1900 09:01:25。该文件受密码保护,因此我通过win32com.client库将其转换为数据框。
I have a xlsx file with a column containing Dates in the format: "01.01.1900 09:01:25". The file is password protected so I convert it to a dataframe by means of win32com.client library.
这里是代码:
import pandas as pd
import win32com.client
xlApp = win32com.client.Dispatch("Excel.Application")
xlApp.DisplayAlerts = False
xlwb = xlApp.Workbooks.Open(File, False, True, None, " ") #Open Workbook password " "
xlws = xlwb.Sheets("Sheet 1") #Open Sheet 1
#Get table dimensions
LastRow = xlws.Range("A1").CurrentRegion.Rows.Count
LastColumn = xlws.Range("A1").CurrentRegion.Columns.Count
header=list((xlws.Range(xlws.Cells(1, 1), xlws.Cells(1, LastColumn)).Value)[0])
content = list(xlws.Range(xlws.Cells(2, 1), xlws.Cells(LastRow, LastColumn)).Value)
#Get the dataframe
df=pd.DataFrame(data=content, columns=header)
print (df)
我检查过一次导入的dtype是否自动并正确分配给datetime该列为64。问题是,无论何时我尝试使用该列的任何值做任何事情(只需打印或比较它),我都会收到一条消息:
I checked that once imported dtype as been automatically and correctly assigned to datetime64 for that column. The issue is that any time I try to do whatever with any value of that column (just print it or compare it) I get a meesage saying:
File "pandas\_libs\tslibs\timezones.pyx", line 227, in pandas._libs.tslibs.timezones.get_dst_info
AttributeError: 'NoneType' object has no attribute 'total_seconds'
Exception ignored in: 'pandas._libs.tslib._localize_tso'
Traceback (most recent call last):
File "pandas\_libs\tslibs\timezones.pyx", line 227, in pandas._libs.tslibs.timezones.get_dst_info
AttributeError: 'NoneType' object has no attribute 'total_seconds'
Traceback (most recent call last):
尽管如此,代码仍然可以正常工作,但是警告消息却让我很烦。
Nonetheless the code works perfectly, but the warning message is annoying me.
我可以使用数据类型来避免该警告吗?
Is there anything I can do with the datatype to avoid that warning?
推荐答案
在此打开excel content
变量是一个元组列表。
Opening the excel in this way, the content
variable is a list of tuples.
看看这些元组,有一个TimeZoneInfo可以将所有日期本地化为一种时区,在我的例子中是 GMT标准时间。
Having a look on those tuples there is a TimeZoneInfo that localizes all the dates in a kind of time zone, in my case "GMT Standard Time".
因此,一旦转换为数据帧,当执行 df.dtypes
时,结果不仅是 datetime64,而且是 datetime64(UTC + 0:00)都柏林,爱丁堡,...
So once converted to a dataframe, when doing df.dtypes
the result is not only "datetime64" but "datetime64 (UTC+0:00) Dublin, Edimburg, ..."
此时区设置仅在通过 win32com.client
。如果删除了密码,则可以使用 pandas.read_excel
打开密码,然后发现没有为这些日期时间设置时区,并且不会出现上述警告。
This time zone setting only happens when opening the excel file through win32com.client
. If you removed the password, you can open it with pandas.read_excel
and discover that there is no timezone set for those datetimes and the mentioned warning does not appear.
不确切知道它发生的原因,但是对于原始示例,我有一个解决方案。该警告消失,将被tz数据库识别的时区设置为 UTC
或简称为 None
。像这样的东西:
Don't know exactly the reason it happens, but I have a solution for the original example. The warning dissapears setting a timezone recognized by tz database as "UTC"
or simply None
. Something like:
df["col_name"]=df["col_name"].dt.tz_convert(None)
这篇关于消息“忽略异常”;处理pandas.datetime类型时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!