在 R 中重塑长到宽的数据 [英] Reshape Long to Wide Data in R
问题描述
我正在尝试在 R 中重塑一些用户数据.我有一个包含会话 ID 的 data.frame.每个会话都有一个 User_ID 和日期.我想使用User_ID"变量作为我的密钥",但仅用于具有新访问者"的userType"的观察.因此,每个新访问者"将有一行.然后将每个后续会话 ID 作为单独的变量传递.例如,如果一个用户 ID 总共有 3 个会话 ID,那么总共有 6 个变量:
I am trying to reshape some user data in R. I have a data.frame of session IDs. Each session has a User_ID and date. I would like to use the "User_ID" variable as my "Key" but only for the observations that have "userType" of "New Visitor". Therefore, there will be a single row for each "New Visitor". Then pass each subsequent Session ID as separate variable. For instance, if a User ID has 3 Session IDs in total, there would be a total of 6 variables:
例如,如果这是用户的数据框:
For instance, if this is the data frame for a user:
date <- c('2015-01-01','2015-01-02','2015-01-02','2015-01-10')
userID <- c('100105276','100105276','100105276','100105276')
sessionID <- c('1452632119','1452634303','1452637067','1453600979')
userType <- c('New Visitor','Returning Visitor','Returning Visitor','Returning Visitor')
df <- cbind(date,userID,sessionID,userType)
相反,我想返回这个:
userID sessionID1 date1 SessionID2 date2 SesionID3 date3
100105276 1452632119 2015-01-01 1452634303 2015-01-02 100105276 2015-01-02
如果存在没有后续 sessionID 的任何用户 ID,则将传递na"值,其中变量缺少值.我已经阅读了使用 tidyr 或 reshape2 来执行此操作的详细信息,但我一直无法让它们完全按照我的要求进行操作.
If there are any userIDs that did not have subsequent sessionIDs, a "na" value would be passed where variables are missing values. I've read up on using tidyr or reshape2 to do this, but I haven't been able to get them to do exactly what I am looking for.
推荐答案
鉴于你的数据是按 userID
和 sessionID
排序的,并且每一行都是一个唯一的会话,你可以这样做:
Given your data is ordered by userID
and sessionID
, and each row is a unique session, you could do:
library(data.table)
# Transform data into data.frame
df <- data.table(df)
df[, id := sequence(.N), by = c("userID")] # session sequence number per user
# Spread columns
reshape(df, timevar = "id", idvar = "userID", direction = "wide")
# userID date.1 sessionID.1 userType.1 date.2 sessionID.2 userType.2 date.3 sessionID.3 userType.3
#1 100105276 2015-01-01 1452632119 New Visitor 2015-01-02 1452634303 Returning Visitor 2015-01-02 1452637067 Returning Visitor
在此输出中,userType
也作为变量包含在内,但之后您可以随时删除它们.
In this output userType
is also included as a variable, but you can always drop them afterwards.
这篇关于在 R 中重塑长到宽的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!