按组取消列出列 [英] Unlisting columns by groups

查看:28
本文介绍了按组取消列出列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下格式的数据框:

I have a dataframe in the following format:

id | name               | logs                                  
---+--------------------+-----------------------------------------
84 |          "zibaroo" |                             "C47931038" 
12 | "fabien kelyarsky" | c("C47331040", "B19412225", "B18511449")
96 |     "mitra lutsko" |              c("F19712226", "A18311450")
34 |       "PaulSandoz" |                             "A47431044" 
65 |       "BeamVision" |                             "D47531045" 

如您所见,日志"列包含每个单元格中的字符串向量.

As you see the column "logs" includes vectors of strings in each cell.

是否有一种有效的方法可以将数据帧转换为长格式(每行一个观察),而无需将日志"分成几列的中间步骤?

Is there an efficient way to convert the data frame to the long format (one observation per row) without the intermediary step of separating "logs" into several columns?

这很重要,因为数据集非常大,而且每人的日志数量似乎是任意的.

This is important because the dataset is very large and the number of logs per person seems to be arbitrary.

换句话说,我需要以下内容:

In other words, I need the following:

id | name               | log                                 
---+--------------------+------------
84 |          "zibaroo" | "C47931038" 
12 | "fabien kelyarsky" | "C47331040"
12 | "fabien kelyarsky" | "B19412225"
12 | "fabien kelyarsky" | "B18511449"
96 |     "mitra lutsko" | "F19712226"
96 |     "mitra lutsko" | "A18311450"
34 |       "PaulSandoz" | "A47431044" 
65 |       "BeamVision" | "D47531045" 

这是真实数据帧部分的dput:

Here is the dput of a section of the real dataframe:

structure(list(id = 148:157, name = c("avihil1", "Niarfe", "doug henderson", 
"nick tan", "madisp", "woodbusy", "kevinhcross", "cylol", "andrewarrow", 
"gstavrev"), logs = list("Z47331572", "Z47031573", c("F47531574", 
"B195945", "D186871", "S192939", "S182865", "G19539045"), c("A47231575", 
"A190933", "C181859"), "F47431576", c("B47231577", "D193936", 
"Q184862"), "Y47331579", c("A47531580", "Z195944", "B185870"), 
"N47731581", "E47231582")), .Names = c("id", "name", "logs"
), row.names = 149:158, class = "data.frame")

推荐答案

这是 tidyr 的完美案例:

This is a perfect case for tidyr:

library(tidyr)
library(dplyr)
dat %>% unnest(logs)

这篇关于按组取消列出列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆