按时间间隔分组消息 [英] Grouping Messages by Time Intervals
问题描述
我目前正在尝试将发送的消息分组1秒的时间间隔。我目前正在计算时间延迟:
I'm currently trying to group messages that are sent out by 1 second time intervals. I'm currently calculating time latency with this:
def time_deltas(infile):
entries = (line.split() for line in open(INFILE, "r"))
ts = {}
for e in entries:
if " ".join(e[2:5]) == "T out: [O]":
ts[e[8]] = e[0]
elif " ".join(e[2:5]) == "T in: [A]":
in_ts, ref_id = e[0], e[7]
out_ts = ts.pop(ref_id, None)
yield (float(out_ts),ref_id[1:-1],(float(in_ts)*1000 - float(out_ts)*1000))
INFILE = 'C:/Users/klee/Documents/test.txt'
import csv
with open('test.csv', 'w') as f:
csv.writer(f).writerows(time_deltas(INFILE))
然而,我想计算发送出去的每秒T in:[A]消息的数量,并一直在努力工作这样做:
HOWEVER I want to calculate the number of "T in: [A]" messages per second that are sent out, and have been trying to work with this to do so:
import datetime
import bisect
import collections
data=[ (datetime.datetime(2010, 2, 26, 12, 8, 17), 5594813L),
(datetime.datetime(2010, 2, 26, 12, 7, 31), 5594810L),
(datetime.datetime(2010, 2, 26, 12, 6, 4) , 5594807L),
]
interval=datetime.timedelta(seconds=50)
start=datetime.datetime(2010, 2, 26, 12, 6, 4)
grid=[start+n*interval for n in range(10)]
bins=collections.defaultdict(list)
for date,num in data:
idx=bisect.bisect(grid,date)
bins[idx].append(num)
for idx,nums in bins.iteritems():
print('{0} --- {1}'.format(grid[idx],len(nums)))
在这里找到: Python:按时间间隔组合结果
(我意识到这些单位将是我想要的,但我只是调查一般的想法...)
(I realize the units would be off for what I want, but I'm just looking into the general idea...)
到目前为止,我一直很失败,不胜感激。
谢谢!
I've been mostly unsuccessful thus far and would appreciate any help. Thanks!
另外,
数据显示为:
Also, The data appears as:
082438.577652 - T in: [A] accepted. ordID [F25Q6] timestamp [082438.575880] RefNumber [6018786] State [L]
再次感谢!对此,我真的非常感激。 :D
Thanks again! I really appreciate it. :D
推荐答案
假设您想要在第二秒内以1秒为间隔发布的数据分组,我们可以利用您的数据被排序,而 int(out_ts)
截断时间戳到第二个可以用作分组键的事实。
Assuming you want to group your data by those issued within 1 second intervals on the second, we can make use of the fact that your data is ordered and that int(out_ts)
truncates the timestamp to the second which we can use as a grouping key.
最简单的分组方式是使用 itertools.groupby
:
Simplest way to do the grouping would be to use itertools.groupby
:
from itertools import groupby
data = get_time_deltas(INFILE)
get_key = lambda x: int(x[0]) # function to get group key from data
bins = [(k, list(g)) for k, g in groupby(data, get_key)]
bins
将是元组的列表,其中元组中的第一个值是关键字(整数,例如 082438
),第二个值是数据条目列表这是在第二个(发布时间戳= 082438。*
)。
bins
will be a list of tuples where the first value in the tuple is the key (integer, e.g. 082438
) and the second value is the a list of data entries that were issued on that second (with timestamp = 082438.*
).
使用示例:
# print out the number of messages for each second
for sec, data in bins:
print('{0} --- {1}'.format(sec, len(data)))
# write (sec, msg_per_sec) out to CSV file
import csv
with open("test.csv", "w") as f:
csv.writer(f).writerows((s, len(d)) for s, d in bins)
# get average message per second
message_counts = [len(d) for s, d in bins]
avg_msg_per_second = float(sum(message_count)) / len(message_count)
PS在这个例子中,列表
用于 bins
,以便维护数据顺序。如果您需要随机访问数据,请考虑使用 OrderedDict
。
P.S. In this example, a list
was used for bins
so that the order of data is maintained. If you need random access to the data, consider using an OrderedDict
instead.
请注意,相对来说,解决方案以秒的倍数分组。例如,按照每分钟(60秒)的消息分组,将 get_key
函数更改为:
Note that it is relatively straight-forward to adapt the solution to group by multiples of seconds. For example, to group by messages per minute (60 seconds), change the get_key
function to:
get_key = lambda x: int(x[0] / 60) # truncate timestamp to the minute
这篇关于按时间间隔分组消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!