如何根据 Pandas 中的列值和时间戳进行顺序计数? [英] How can I do a sequential count based on column value and timestamp in pandas?

查看:90
本文介绍了如何根据 Pandas 中的列值和时间戳进行顺序计数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望能够添加一个列,该列根据值按顺序对行进行计数.例如,下面是三个不同的人,他们的记录带有时间戳.我想根据 PersonID 计算记录的顺序.这应该为每个 PersonID 重新启动.(我可以使用 Index() 在 Tableau 中执行此操作,但我也希望它成为原始文件的一部分)

<代码>>人员 ID、日期时间、订单、总计a226 2015-04-16 11:57:36 1 1a226 2015-04-17 15:32:14 2 1a226 2015-04-17 19:13:43 3 1z342 2015-04-15 07:02:20 1 1x391 2015-04-17 13:43:31 1 1x391 2015-04-17 05:12:16 2 1

如果有一种方法可以减去DateTime?我的方法是只选择订单 1 作为数据框,然后只选择订单 2,然后合并,然后减去.有没有办法自动完成?

解决方案

IIUC,你可以用 cumcount:

<预><代码>>>>df["订单"] = df.groupby("PersonID").cumcount() + 1>>>dfPersonID 日期时间顺序0 a226 2015-04-16 11:57:36 11 a226 2015-04-17 15:32:14 22 a226 2015-04-17 19:13:43 33 z342 2015-04-15 07:02:20 14 x391 2015-04-17 13:43:31 15 x391 2015-04-17 05:12:16 2

如果你想保证它的时间顺序是递增的,你应该先按 DateTime 排序,但你的例子有 x391 非递增顺序,所以我假设你不想要

<小时>

如果你确实想要涉及时间戳,我倾向于先排序,以便让生活更轻松:

<预><代码>>>>df["DateTime"] = pd.to_datetime(df["DateTime"]) # 以防万一>>>df = df.sort(["PersonID", "DateTime"])>>>df["订单"] = df.groupby("PersonID").cumcount() + 1>>>dfPersonID 日期时间顺序0 a226 2015-04-16 11:57:36 11 a226 2015-04-17 15:32:14 22 a226 2015-04-17 19:13:43 35 x391 2015-04-17 05:12:16 14 x391 2015-04-17 13:43:31 23 z342 2015-04-15 07:02:20 1

即使不进行排序,您也可以在分组列上调用 rank,它有更多选项来指定您希望如何处理关系:

<预><代码>>>>df["订单"] = df.groupby("PersonID")["DateTime"].rank()>>>dfPersonID 日期时间顺序0 a226 2015-04-16 11:57:36 11 a226 2015-04-17 15:32:14 22 a226 2015-04-17 19:13:43 35 x391 2015-04-17 05:12:16 14 x391 2015-04-17 13:43:31 23 z342 2015-04-15 07:02:20 1

I would like to be able to add a column which counts rows in order based on a value. For example, below are three different people with records that have a timestamp. I want to count the order of records based on the PersonID. This should restart for every PersonID. (I am able to do this in Tableau with Index() but I want it part of the raw file too)

> PersonID,             DateTime,             Order,     Total
    a226           2015-04-16 11:57:36          1          1
    a226           2015-04-17 15:32:14          2          1
    a226           2015-04-17 19:13:43          3          1
    z342           2015-04-15 07:02:20          1          1
    x391           2015-04-17 13:43:31          1          1
    x391           2015-04-17 05:12:16          2          1

If there is a way to subtract the DateTime as well? My way would be to only select Order 1 as a dataframe, then only select Order 2, then merge, then subtract. Is there a way to do it automatically?

解决方案

IIUC, you can do a groupby with cumcount:

>>> df["Order"] = df.groupby("PersonID").cumcount() + 1
>>> df
  PersonID             DateTime  Order
0     a226  2015-04-16 11:57:36      1
1     a226  2015-04-17 15:32:14      2
2     a226  2015-04-17 19:13:43      3
3     z342  2015-04-15 07:02:20      1
4     x391  2015-04-17 13:43:31      1
5     x391  2015-04-17 05:12:16      2

If you want to guarantee that it's in increasing time order, you should sort by DateTime first, but your example has x391 in non-increasing order, so I'm assuming you don't want that.


If you do want to involve the timestamps, I tend to sort first, to make life easier:

>>> df["DateTime"] = pd.to_datetime(df["DateTime"]) # just in case
>>> df = df.sort(["PersonID", "DateTime"])
>>> df["Order"] = df.groupby("PersonID").cumcount() + 1
>>> df
  PersonID            DateTime  Order
0     a226 2015-04-16 11:57:36      1
1     a226 2015-04-17 15:32:14      2
2     a226 2015-04-17 19:13:43      3
5     x391 2015-04-17 05:12:16      1
4     x391 2015-04-17 13:43:31      2
3     z342 2015-04-15 07:02:20      1

Even without sorting, though, you could call rank on the grouped column, which has more options to specify how you want to handle ties:

>>> df["Order"] = df.groupby("PersonID")["DateTime"].rank()
>>> df
  PersonID            DateTime  Order
0     a226 2015-04-16 11:57:36      1
1     a226 2015-04-17 15:32:14      2
2     a226 2015-04-17 19:13:43      3
5     x391 2015-04-17 05:12:16      1
4     x391 2015-04-17 13:43:31      2
3     z342 2015-04-15 07:02:20      1

这篇关于如何根据 Pandas 中的列值和时间戳进行顺序计数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆