信息检索:在一个时间框架的URL点击 [英] Information Retrieval :URL hits in a time frame
问题描述
算法挑战:
问题陈述: 你会如何设计的东西,像谷歌日志记录系统,你应该能够查询的URL在两个时间框架内打开的次数。
Problem statement : How would you design a logging system for something like Google , you should be able to query for the number of times a URL was opened within two time frames.
I / P:START_TIME,END_TIME,URL1 O / P:次URL1在开始和结束时间之间开通数
i/p : start_time , end_time , URL1 o/p : number of times URL1 was opened between start and end time.
一些规格: 数据库不是一个最佳的解决方案 一个URL可能已被多次打开了给定的时间戳记。 甲URL可能已开封的内两个时间戳大量的次。 START_TIME和END_TIME可以相隔一个月。 时间可能是颗粒第二。
Some specs : Database is not an optimal solution A URL might have been opened multiple times for given time stamp. A URL might have been opened a large number of times within two time stamps. start_time and end_time can be a month apart. time could be granular to a second.
推荐答案
解决方法一:
散列散列
核心价值 URL哈希----> T1 CumFrequency
Key Value URL Hash----> T1 CumFrequency
例如:
亚马逊哈希 - > T CumFreq 11日上午3(上午11点开3次) 11月15日凌晨4(开1时间上午11:15,cumfreq是3 + 1 = 4) 11月30日上午11时(开4次11:30,cumfreq是3 + 4 + 4 = 11) I / P:11:10点,11:37时,亚马逊
Amazon Hash--> T CumFreq 11 00 am 3 ( opened 3 times at 11:00 am ) 11 15 am 4 ( opened 1 time at 11:15 am , cumfreq is 3+1=4) 11 30 am 11 ( opened 4 times at 11:30 am , cumfreq is 3+4+4=11) i/p : 11 : 10 am , 11 : 37 am , Amazon
运算可以减去获得,不到最后一戳,然后11:10其中上午11点,和最后活动时间戳记不到上午11点37分这是上午11:30。因此,结果是 11-3 = 8 ....
the o.p can be obtained by subtracting , last timestamp less then 11:10 which 11:00 am , and last active time stamp less than 11:37 am which is 11:30 am. Hence the result is 11-3 = 8 ....
我们可以做的更好?
这篇关于信息检索:在一个时间框架的URL点击的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!