使用“登录"重塑 R 中的数据“退出"次 [英] Reshaping data in R with "login" "logout" times

查看:14
本文介绍了使用“登录"重塑 R 中的数据“退出"次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 R 的新手,正在为自己的目的从事一个副项目.我有这个数据(可重现的 dput 在问题的末尾):

I'm new to R, and am working on a side project for my own purposes. I have this data (reproducable dput of this is at the end of the question):

     X            datetime  user  state
1    1 2016-02-19 19:13:26 User1 joined
2    2 2016-02-19 19:21:18 User2 joined
3    3 2016-02-19 19:21:33 User1 joined
4    4 2016-02-19 19:35:38 User1 joined
5    5 2016-02-19 19:44:15 User1 joined
6    6 2016-02-19 19:48:55 User1 joined
7    7 2016-02-19 19:52:40 User1 joined
8    8 2016-02-19 19:53:15 User3 joined
9    9 2016-02-19 20:02:34 User3 joined
10  10 2016-02-19 20:13:48 User3 joined
19 637 2016-02-19 19:13:32 User1   left
20 638 2016-02-19 19:25:26 User1   left
21 639 2016-02-19 19:30:30 User2   left
22 640 2016-02-19 19:42:16 User1   left
23 641 2016-02-19 19:47:59 User1   left
24 642 2016-02-19 19:51:06 User1   left
25 643 2016-02-19 20:02:26 User3   left 

我希望它看起来像这样:

I want it to look like this:

    user  joined                left
1   User1 2016-02-19 19:13:26   2016-02-19 19:13:32
2   User2 2016-02-19 19:21:18   2016-02-19 19:30:30
3   User3 2016-02-19 19:53:15   2016-02-19 20:02:26 
4   User1 2016-02-19 19:21:33   2016-02-19 19:25:26
.
.
.

我正在研究 tidyr,因为显然涉及一些重塑,但我无法理解到底需要做什么.这甚至可能吗(没有循环/大量的程序代码)?我无法理解如何绕过的问题是,无法知道特定的左"记录应该连接到特定的连接"记录.我可以找到的所有示例都涉及收集其他值的静态月份或日期.我应该补充一点,不一定保证所有记录都具有左"值(用户可能仍会加入").

I'm looking at tidyr as there's some reshaping involved obviously, but I can't wrap my head around what exactly needs to be done. Is this even possible (without looping/massive amounts of procedural code)? The problem I can't grasp how to get around is that there's no way to know that a particular "left" record should be joined to a particular "joined" record. Examples I can find all involve a static month or day over which other values are gathered. I should add that it's not necessarily guaranteed that all records are guaranteed to have a "left" value (a user might still be "joined").

这是数据样本的 dput 输出:

Here's the dput output of a sample of the data:

> dput(samp)
structure(list(X = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 637L, 638L, 639L, 640L, 
641L, 642L, 643L, 644L, 645L, 646L, 647L, 648L, 649L, 650L, 651L
), datetime = structure(c(1L, 3L, 4L, 7L, 9L, 11L, 13L, 14L, 
16L, 18L, 21L, 22L, 23L, 26L, 27L, 30L, 32L, 33L, 2L, 5L, 6L, 
8L, 10L, 12L, 15L, 17L, 19L, 20L, 24L, 25L, 28L, 29L, 31L), .Label = c("2016-02-19 19:13:26", 
"2016-02-19 19:13:32", "2016-02-19 19:21:18", "2016-02-19 19:21:33", 
"2016-02-19 19:25:26", "2016-02-19 19:30:30", "2016-02-19 19:35:38", 
"2016-02-19 19:42:16", "2016-02-19 19:44:15", "2016-02-19 19:47:59", 
"2016-02-19 19:48:55", "2016-02-19 19:51:06", "2016-02-19 19:52:40", 
"2016-02-19 19:53:15", "2016-02-19 20:02:26", "2016-02-19 20:02:34", 
"2016-02-19 20:13:38", "2016-02-19 20:13:48", "2016-02-19 20:42:27", 
"2016-02-19 20:48:22", "2016-02-19 20:49:31", "2016-02-19 20:59:58", 
"2016-02-19 21:06:20", "2016-02-19 21:10:43", "2016-02-19 21:11:13", 
"2016-02-19 21:11:15", "2016-02-19 21:11:22", "2016-02-19 21:17:33", 
"2016-02-19 22:02:45", "2016-02-19 22:05:18", "2016-02-19 22:05:37", 
"2016-02-19 22:05:47", "2016-02-19 22:30:30"), class = "factor"), 
    user = structure(c(1L, 2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 
    3L, 4L, 1L, 1L, 4L, 4L, 4L, 3L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 
    3L, 3L, 1L, 4L, 1L, 1L, 4L, 4L), .Label = c("User1", "User2", 
    "User3", "User4"), class = "factor"), state = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L), .Label = c("joined", "left"), class = "factor")), .Names = c("X", 
"datetime", "user", "state"), class = "data.frame", row.names = c(NA, 
-33L))

推荐答案

我们可以利用left"和joined"的顺序,每个用户一个跟一个的匹配.

We can make use of the order of "left" and "joined", and match when one follows the other for each user.

为此,我将使用 library(data.table)

library(data.table)
setDT(df)

## order the data by user and datetime
df <- df[order(user, datetime)]
## add an 'order' column, which is a sequence from 1 to lenght()  
## for each user
df[, order := seq(1:.N), by=user]

## split the left and joins
dt_left <- df[state == "left"]
dt_joined <- df[state == "joined"]

## assuming 'left' is after 'joined', shift the 'order' back for left
dt_left[, order := order - 1]

## join user an dorder (and subsetting relevant columns) 
## keeping when there's a 'joined' but not a 'left'
dt <- dt_left[, .(user, order, datetime)][dt_joined[, .(user, order, datetime)], on=c("user", "order"), nomatch=NA]

## rename columns
setnames(dt, c("datetime", "i.datetime"), c("left", "joined"))

     user order                left              joined
 1: User1     1 2016-02-19 19:13:32 2016-02-19 19:13:26
 2: User1     3 2016-02-19 19:25:26 2016-02-19 19:21:33
 3: User1     5 2016-02-19 19:42:16 2016-02-19 19:35:38
 4: User1     7 2016-02-19 19:47:59 2016-02-19 19:44:15
 5: User1     9 2016-02-19 19:51:06 2016-02-19 19:48:55
 6: User1    11 2016-02-19 20:48:22 2016-02-19 19:52:40
 7: User1    13 2016-02-19 21:11:13 2016-02-19 21:06:20
 8: User1    15 2016-02-19 21:17:33 2016-02-19 21:11:15
 9: User2     1 2016-02-19 19:30:30 2016-02-19 19:21:18
10: User3     1 2016-02-19 20:02:26 2016-02-19 19:53:15
11: User3     3 2016-02-19 20:13:38 2016-02-19 20:02:34
12: User3     5 2016-02-19 20:42:27 2016-02-19 20:13:48
13: User3     7                  NA 2016-02-19 20:49:31
14: User3     8                  NA 2016-02-19 22:30:30
15: User4     1 2016-02-19 21:10:43 2016-02-19 20:59:58
16: User4     3 2016-02-19 22:02:45 2016-02-19 21:11:22
17: User4     5 2016-02-19 22:05:37 2016-02-19 22:05:18
18: User4     7                  NA 2016-02-19 22:05:47

这篇关于使用“登录"重塑 R 中的数据“退出"次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆