pandas 为每个时间段分配组号 [英] Pandas assign group numbers for each time bin
问题描述
我有一个如下所示的pandas数据框.
I have a pandas dataframe that looks like below.
Key Name Val1 Val2 Timestamp
101 A 10 1 01-10-2019 00:20:21
102 A 12 2 01-10-2019 00:20:21
103 B 10 1 01-10-2019 00:20:26
104 C 20 2 01-10-2019 14:40:45
105 B 21 3 02-10-2019 09:04:06
106 D 24 3 02-10-2019 09:04:12
107 A 24 3 02-10-2019 09:04:14
108 E 32 2 02-10-2019 09:04:20
109 A 10 1 02-10-2019 09:04:22
110 B 10 1 02-10-2019 10:40:49
从最早的时间戳(即"01-10-2019 00:20:21")开始,我需要创建每个10秒的时间段,并将相同的组号分配给具有时间戳的所有行斌 输出应如下所示.
Starting from the earliest timestamp, that is, '01-10-2019 00:20:21', I need to create time bins of 10 seconds each and assign same group number to all the rows having timestamp fitting in a time bin. The output should look as below.
Key Name Val1 Val2 Timestamp Group
101 A 10 1 01-10-2019 00:20:21 1
102 A 12 2 01-10-2019 00:20:21 1
103 B 10 1 01-10-2019 00:20:26 1
104 C 20 2 01-10-2019 14:40:45 2
105 B 21 3 02-10-2019 09:04:06 3
106 D 24 3 02-10-2019 09:04:12 4
107 A 24 3 02-10-2019 09:04:14 4
108 E 32 2 02-10-2019 09:04:20 4
109 A 10 1 02-10-2019 09:04:22 5
110 B 10 1 02-10-2019 10:40:49 6
第一次垃圾箱:"01-10-2019 00:20:21"至"01-10-2019 00:20:30", 下一次垃圾箱:"01-10-2019 00:20:31"至"01-10-2019 00:20:40", 下一次垃圾箱:"01-10-2019 00:20:41"至"01-10-2019 00:20:50", 下一次垃圾箱:"01-10-2019 00:20:51"至"01-10-2019 00:21:00", 下一次垃圾箱:从'01 -10-2019 00:21:01'到'01 -10-2019 00:21:10' 依此类推..基于这些时间仓,为每行分配组". 连续的组号不是强制性的(如果没有时间仓,则可以跳过该组号).
First time bin: '01-10-2019 00:20:21' to '01-10-2019 00:20:30', Next time bin: '01-10-2019 00:20:31' to '01-10-2019 00:20:40', Next time bin: '01-10-2019 00:20:41' to '01-10-2019 00:20:50', Next time bin: '01-10-2019 00:20:51' to '01-10-2019 00:21:00', Next time bin: '01-10-2019 00:21:01' to '01-10-2019 00:21:10' and so on.. Based on these time bins, 'Group' is assigned for each row. It is not mandatory to have consecutive group numbers(If a time bin is not present, it's ok to skip that group number).
我已经使用for循环生成了此文件,但是如果数据分散在几个月中,则需要花费大量时间. 请让我知道是否可以使用单行代码将其作为熊猫操作来完成.谢谢.
I have generated this using for loop, but it takes lot of time if data is spread across months. Please let me know if this can be done as a pandas operation using a single line of code. Thanks.
推荐答案
以下是不带loop
的示例.主要方法是将秒数舍入到特定范围并使用ngroup()
.
Here is an example without loop
. The main approach is round up seconds to specific ranges and use ngroup()
.
02-10-2019 09:04:12 -> 02-10-2019 09:04:11
02-10-2019 09:04:14 -> 02-10-2019 09:04:11
02-10-2019 09:04:20 -> 02-10-2019 09:04:11
02-10-2019 09:04:21 -> 02-10-2019 09:04:21
02-10-2019 09:04:25 -> 02-10-2019 09:04:21
...
我使用新的临时列来查找一些特定范围.
I use a new temporary column to find some specific range.
df = pd.DataFrame.from_dict({
'Name': ('A', 'A', 'B', 'C', 'B', 'D', 'A', 'E', 'A', 'B'),
'Val1': (1, 2, 1, 2, 3, 3, 3, 2, 1, 1),
'Timestamp': (
'2019-01-10 00:20:21',
'2019-01-10 00:20:21',
'2019-01-10 00:20:26',
'2019-01-10 14:40:45',
'2019-02-10 09:04:06',
'2019-02-10 09:04:12',
'2019-02-10 09:04:14',
'2019-02-10 09:04:20',
'2019-02-10 09:04:22',
'2019-02-10 10:40:49',
)
})
# convert str to Timestamp
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
# your specific ranges. customize if you need
def sec_to_group(x):
if 0 <= x.second <= 10:
x = x.replace(second=0)
elif 11 <= x.second <= 20:
x = x.replace(second=11)
elif 21 <= x.second <= 30:
x = x.replace(second=21)
elif 31 <= x.second <= 40:
x = x.replace(second=31)
elif 41 <= x.second <= 50:
x = x.replace(second=41)
elif 51 <= x.second <= 59:
x = x.replace(second=51)
return x
# new column formated_dt(temporary) with formatted seconds
df['formated_dt'] = df['Timestamp'].apply(sec_to_group)
# group by new column + ngroup() and drop
df['Group'] = df.groupby('formated_dt').ngroup()
df.drop(columns=['formated_dt'], inplace=True)
print(df)
输出:
# Name Val1 Timestamp Group
# 0 A 1 2019-01-10 00:20:21 0 <- ngroup() calculates from 0
# 1 A 2 2019-01-10 00:20:21 0
# 2 B 1 2019-01-10 00:20:26 0
# 3 C 2 2019-01-10 14:40:45 1
# 4 B 3 2019-02-10 09:04:06 2
# ....
您还可以尝试使用 TimeGrouper或重采样.
希望这会有所帮助.
这篇关于 pandas 为每个时间段分配组号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!