pandas 在时间为00:00时读取excel返回类型对象 [英] Pandas read excel returning type object when time is 00:00

查看:103
本文介绍了 pandas 在时间为00:00时读取excel返回类型对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在较新版本的Pandas(我正在使用1.2.3)中,当从excel文件中读取时间时,如果时间为00:00:00,则会出现问题.在脚本下面,其中filepath是指向我的excel文件的路由,该文件包含一列带有标题为时间"的标题.

In more recent versions of Pandas (I am using 1.2.3) when reading times from an excel file, there is a problem when the time is 00:00:00. Below script, where filepath is the route to my excel file, which contains a column with a header named 'Time'.

import pandas as pd

df = pd.read_excel(filepath)
print(df['Time'])

输出:

0                20:00:00
1                22:00:00
2                23:00:00
3     1899-12-30 00:00:00
4                02:00:00
5                02:45:00
6                03:30:00
7                04:00:00
8                04:45:00
9                05:30:00
10               07:00:00
11               08:00:00
12               08:45:00
13               09:30:00
14               10:30:00
15               10:45:00
16               11:45:00
17               12:30:00
18               13:15:00
19               14:00:00
20               14:45:00
21               15:45:00
22               23:00:00
23    1899-12-30 00:00:00

在1.0.5版中不是这种情况.

This was not the case in version 1.0.5.

有没有一种方法可以正确读取这些时间,而没有上面第3和23行上的日期?

Is there a way to read in these times correctly, without the date on rows 3 and 23 above?

推荐答案

我可以重现此行为(熊猫1.2.3);会在时间"列中为您提供 datetime.datetime datetime.time 对象的混合.

I can reproduce this behavior (pandas 1.2.3); it leaves you with a mix of datetime.datetime and datetime.time objects in the 'time' column.

一种方法 可以将时间列导入为字符串类型,您可以明确指定类似

One way around can be to import the time column as type string, you can explicitly specify that like

df = pd.read_excel(path_to_your_excelfile, dtype={'Time': str})

这将为您提供超越零日的卓越表现"以某些条目为前缀.您可以通过在空格上拆分并使用拆分结果的最后一个元素来删除它们:

which will give you "excel day zero" prefixed to some entries. You can remove them by split on space an taking the last element of the split result:

df['Time'].str.split(' ').str[-1]

现在,您可以将字符串转换为 datetime timedelta 等.-在您的上下文中任何有意义的事情.

Now you can proceed by converting string to datetime, timedelta etc. - whatever makes sense in your context.

另一种解决方法 可以指定大熊猫将此列解析为日期时间;喜欢

Another way to handle this can be to specify that pandas should parse this column to datetime; like

df = pd.read_excel(path_to_your_excelfile, parse_dates=['Time'])

然后,您将获得大熊猫的日期时间,其中包含今天的日期或"excel day day 0":

Then, you'll have pandas' datetime, with either today's date or "excel day zero":

df['Time']

0    2021-03-04 20:00:00
1    2021-03-04 22:00:00
2    2021-03-04 23:00:00
3    1899-12-30 00:00:00
4    2021-03-04 02:00:00
...
23   1899-12-30 00:00:00
Name: Time, dtype: datetime64[ns]

现在,您有一些选择,这取决于您打算进一步处理数据.您可以忽略日期,或删除日期( df ['Time'].dt.time ),或解析为字符串( df ['Time'].dt.strftime('%H:%M:%S'))等

Now you have some options, depending on what you intend to do further with the data. You could just ignore the date, or strip it (df['Time'].dt.time), or parse to string (df['Time'].dt.strftime('%H:%M:%S')) etc.

这篇关于 pandas 在时间为00:00时读取excel返回类型对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆