分箱 Pandas 列的时间戳 [英] Binning Pandas column of timestamps

查看:53
本文介绍了分箱 Pandas 列的时间戳的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在数据框中合并一列时间戳.时间戳的格式为 0:00:00,我认为它们是字符串.我尝试使用 uber.dtypes() 但它一直返回错误:

I am trying to bin a column of timestamps in a dataframe. The timestamps are of the format 0:00:00, and I think they are strings. I tried using uber.dtypes() but it keeps returning an error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-b4120eada070> in <module>()
----> 1 uber.dtypes()

TypeError: 'Series' object is not callable

参考数据帧图片

uber["Time"].head().to_dict() 返回以下内容:

{0: '0:11:00', 1: '0:17:00', 2: '0:21:00', 3: '0:28:00', 4: '0:33:00'}

当我使用这些垃圾箱和标签时:

When I use these bins and labels:

bins = np.arange(0, 25, 1)
labels = [
    "0:00-1:00",
    "1:01-2:00",
    "2:01-3:00",
    "3:01-4:00",
    "4:01-5:00",
    "5:01-6:00",
    "6:01-7:00",
    "7:01-8:00",
    "8:01-9:00",
    "9:01-10:00",
    "10:01-11:00",
    "11:01-12:00",
    "12:01-13:00",
    "13:01-14:00",
    "14:01-15:00",
    "15:01-16:00",
    "16:01-17:00",
    "17:01-18:00",
    "18:01-19:00",
    "19:01-20:00",
    "20:01-21:00",
    "21:01-22:00",
    "22:01-23:00",
    "23:01-24:00"
]

uber["Hour"] = pd.cut(uber["Time"], bins, labels = labels)

我收到以下错误:

TypeError: '<' not supported between instances of 'int' and 'str'

如果我将垃圾箱更改为:

If I change the bins to:

bins = str(np.arange(0, 25, 1)

我收到此错误:

AxisError: axis -1 is out of bounds for array of dimension 0

我意识到我可能可以将这些转换为秒,然后我们 pd.to_numeric() 将列转换为整数,以便它们可以被分箱,但我已经浏览了文档,但仍然不清楚如何使用日期时间或时间(我可以做很长的路并乘以秒和分钟).

I realize I could probably convert these to seconds and us pd.to_numeric() to convert the column to integers so they can be binned, but I've poked around the documentation and am still unclear on how exactly to do so using datetime or time (I could do it the long way and multiply by seconds and minutes).

1) 如何使用日期时间或时间将这些时间戳转换为秒?

1) How could I convert these timestamps to seconds using datetime or time?

2) 有没有办法在不将时间戳转换为秒的情况下对这些进行分类?

2) Is there a way to bin these without converting the timestamps to seconds?

我还尝试将 uber["Time"] 中的值转换为 datetime.time 对象,并在合并之前将它们插入新列 ["Time Object"] 中:

I have also tried converting the values in uber["Time"] to datetime.time objects and inserting them in a new column ["Time Object"] before binning:

for i in range(len(uber["Time"])):
    uber.loc[i, "Time Object"] = datetime.datetime.strptime(uber.loc[i, "Time"], "%H:%M:%S").time()

如果我尝试使用 ["Time Object"] 列进行分类:

If I try to bin using the ["Time Object"] column:

uber["Hour"] = pd.cut(uber["Time Object"], bins = 24, labels = labels)

然后我收到此错误:

TypeError: '<=' not supported between instances of 'datetime.time' and 'str'

如果我尝试使用 ["Time Object"] 列的小时进行分类:

If I try to bin using the hour of the ["Time Object"] column:

uber["Hour"] = pd.cut(uber["Time Object"].hour, bins = 24, labels = labels)

uber["Hour"] = pd.cut(uber["Time Object"].hour, bins = 24, labels = labels)

我收到此错误:

AttributeError: 'Series' object has no attribute 'hour'

推荐答案

你可以试着花几分钟时间处理一下

You can try of taking minutes and bin to it

uber = pd.DataFrame()

labels = [str(i)+':01-'+str(i+1)+':00' for i in range(59)]    
uber['Time'] = {0: '0:11:00', 1: '0:17:00', 2: '0:21:00', 3: '0:28:00', 4: '0:33:00'}.values()
uber.Time = pd.to_timedelta(uber.Time)
pd.cut(uber.Time.dt.seconds/60,bins,labels=labels)

出:

0    10:01-11:00
1    16:01-17:00
2    20:01-21:00
3    27:01-28:00
4    32:01-33:00
Name: Time, dtype: category
Categories (59, object): [0:01-1:00 < 1:01-2:00 < 2:01-3:00 < 3:01-4:00 ... 55:01-56:00 < 56:01-57:00 < 57:01-58:00 < 58:01-59:00]

这篇关于分箱 Pandas 列的时间戳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆