根据用户ID和时差创建会话ID [英] Create a Session ID from User ID and time differences

查看：66 发布时间：2020/10/15 20:12:35 r data.table

本文介绍了根据用户ID和时差创建会话ID的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对此有类似的问题（创建基于用户ID的会话ID以及创建会话ID时 timeStamp 中的差异；尽管我的规格略有不同。也许解决方案在本文中仍然很明显，但我无法将其应用于我的需求-指出原始解决方案如何满足我的问题。

I have a similar question to this (Create a "sessionID" based on "userID" and differences in "timeStamp") on creating a 'Session ID'; though my specifications are slightly different. Perhaps the solution is still apparent in this post but I could not apply it to my needs -- pointing out how the original solution satisfies my question would be equivalent.

我的 data.table 看起来像这样（dput在下面可用）：

My data.table looks like this (dput available below):

unique_visitor_id        datetime            
100                 2016-07-25 15:43:02      
100                 2016-08-15 15:35:16      
101                 2016-08-01 21:24:46      
101                 2016-08-13 05:32:27      
101                 2016-08-13 05:33:01      
101                 2016-08-13 05:33:37      
101                 2016-08-13 05:34:04      
101                 2016-08-13 05:37:42      
101                 2016-08-13 05:38:20      
102                 2016-09-15 17:28:00      
102                 2016-09-15 17:31:04      
103                 2016-07-18 21:19:07

NB： 日期时间通过 ymd_hms（datetime）

我想要的是一个用于标识会话的新变量，它是一个简单的整数序列（不需要像原始问题一样合并visitorID）—定义了会话只要访问记录少于< = 30m并且在同一天之内。因此，例如，前两行将是两个不同的会话：尽管是同一位访问者，但时间差> 30m。

What I'd like is a new variable identifying the session, which is a simple integer sequence (does not need to incorporate the visitorID, like the original question) -- a session is defined by visitor, as long as records are <= 30m AND within the same day. So for example, the first two rows would be two different sessions: though it's the same visitor, the difference in time is >30m.

上述数据的期望输出将会是：

The desired output from the above data would be:

unique_visitor_id        datetime            session_id
100                 2016-07-25 15:43:02           1
100                 2016-08-15 15:35:16           2
101                 2016-08-01 21:24:46           3
101                 2016-08-13 05:32:27           4
101                 2016-08-13 05:33:01           4
101                 2016-08-13 05:33:37           4
101                 2016-08-13 05:34:04           4
101                 2016-08-13 05:37:42           4
101                 2016-08-13 05:38:20           4
102                 2016-09-15 17:28:00           5
102                 2016-09-15 17:31:04           5
103                 2016-07-18 21:19:07           6

如果这可以通过 data.table 的方式完成，这是理想的。再次抱歉，如果我从原始问题的解决方案中遗漏了一些东西！

If this can be done in a data.table way, that would be desirable. Again, apologies if I am missing something from the original question's solution!

这是 dput 示例数据表：

myDT <- structure(list(unique_visitor_id = c(100L, 100L, 101L, 
                                 101L, 101L, 101L, 101L, 101L, 101L, 102L, 102L, 103L), 
           datetime = structure(c(1469475782, 1471289716, 1470101086, 1471080747, 1471080781, 
                                            1471080817, 1471080844, 1471081062, 1471081100, 1473974880, 
                                            1473975064, 1468891147), 
                                          tzone = "EST5EDT", class = c("POSIXct", "POSIXt"))), 
      .Names = c("unique_visitor_id", "datetime"), 
      sorted = c("unique_visitor_id", "datetime"), 
      class = c("data.table", "data.frame"), 
      row.names = c(NA, -12L))

推荐答案

假设您的da ta帧最初是按访问者ID和日期时间排序的，您可以在条件向量上使用 cumsum（），其中TRUE是新的 session_id 应该出现：

Assuming your data frame is originally sorted by visitor id and datetime, you can use cumsum() on the condition vector which is TRUE where a new session_id should appear:

myDT[, session_id := cumsum(c(T, diff(unique_visitor_id) != 0 | diff(datetime)/60 > 30))][]

#    unique_visitor_id            datetime session_id
# 1:               100 2016-07-25 15:43:02          1
# 2:               100 2016-08-15 15:35:16          2
# 3:               101 2016-08-01 21:24:46          3
# 4:               101 2016-08-13 05:32:27          4
# 5:               101 2016-08-13 05:33:01          4
# 6:               101 2016-08-13 05:33:37          4
# 7:               101 2016-08-13 05:34:04          4
# 8:               101 2016-08-13 05:37:42          4
# 9:               101 2016-08-13 05:38:20          4
#10:               102 2016-09-15 17:28:00          5
#11:               102 2016-09-15 17:31:04          5
#12:               103 2016-07-18 21:19:07          6

这篇关于根据用户ID和时差创建会话ID的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据用户ID和时差创建会话ID [英] Create a Session ID from User ID and time differences

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

根据用户ID和时差创建会话ID [英] Create a Session ID from User ID and time differences

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭