网络日志中最频繁的3页序列 [英] Most Frequent 3 page sequence in a weblog

查看：80 发布时间：2020/6/3 20:48:41 algorithm

本文介绍了网络日志中最频繁的3页序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给出一个包含用户，页面网址字段的网络日志。我们必须找出用户需要的最频繁的3页序列。

Given a web log which consists of fields 'User ' 'Page url'. We have to find out the most frequent 3-page sequence that users takes.

有一个时间戳记。并且不能保证单个用户访问将按顺序记录，就像user1 Page1 user2 Pagex user1 Page2 User10 Pagex user1 Page 3一样，其User1的页面顺序为page1-> page2-> page 3

There is a time stamp. and it is not guaranteed that the single user access will be logged sequentially it could be like user1 Page1 user2 Pagex user1 Page2 User10 Pagex user1 Page 3 her User1s page sequence is page1-> page2-> page 3

推荐答案

假设您的日志按时间戳顺序存储，以下是一种算法，可以满足您的需要：

Assuming your log is stored in timestamp order, here's an algorithm to do what you need:

创建一个哈希表'user_visits'，将用户ID映射到您观察到的用户访问的最后两个页面

创建一个哈希表'visit_count'，将3元组的页面映射到频率计数

对于日志中的每个条目（用户，URL）：

Create a hashtable 'user_visits' mapping user ID to the last two pages you observed them to visit
Create a hashtable 'visit_count' mapping 3-tuples of pages to frequency counts
For each entry (user, URL) in the log:

如果user_visits中存在用户如果有两个条目，则将与URL的三元组相对应的visit_count中的条目增加一个

在user_visits中的相关条目上附加 URL，并在必要时删除最旧的条目。

按值对visit_count哈希表进行排序。这是您最受欢迎的URL序列列表。

这里是Python中的一种实现，假设您的字段以空格分隔：

Here's an implementation in Python, assuming your fields are space-separated:

fh = open('log.txt', 'r')
user_visits = {}
visit_counts = {}
for row in fh:
  user, url = row.split(' ')
  prev_visits = user_visits.get(user, ())
  if len(prev_vists) == 2:
    visit_tuple = prev_vists + (url,)
    visit_counts[visit_tuple] = visit_counts.get(visit_tuple, 0) + 1
  user_visits[user] = (prev_vists[1], url)
popular_sequences = sorted(visit_counts, key=lambda x:x[1], reverse=True)

这篇关于网络日志中最频繁的3页序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

网络日志中最频繁的3页序列 [英] Most Frequent 3 page sequence in a weblog

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

网络日志中最频繁的3页序列 [英] Most Frequent 3 page sequence in a weblog

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭