将具有公共 ID 的行压缩为一行 [英] Compress rows with common id's to one row
问题描述
我有一个问题,我还没有找到答案.有类似的问题,其解决方案在我的情况下不太适用.我有一个包含四列的数据集,如下例所示:
I have a question that I have not found an answer for. There are similar questions whose solutions don't quite work in my situation. I have a data set that has four columns like this example:
Name Session Sequence Page
Bob 001 001 home
Bob 001 002 news
Bob 001 003 contact_us
Bob 001 004 home
Sally 001 001 home
Sally 001 002 contact_us
Bob 002 001 home
John 001 001 home
John 001 002 about_us
我想要的是这样的
Name Session Pages
Bob 001 home-news-contact_us-home
Sally 001 home-contact_us
Bob 002 home
John 001 home-about-us
现在的诀窍是序列可以从 1:44 开始,或者介于两者之间.我正在用 R 编码并且有可用的 SQLite.我还需要连接破折号,但这很容易.如果 R 在 SAS 中有诸如滞后"之类的东西,这将很容易.
Now the trick is that Sequence can be from 1:44, or anywhere in between. I am coding in R and have SQLite available. I also need to concatenate in the dashes, but that is easy. If R had something like 'lag' in SAS this would be a snap.
推荐答案
你已经有了一些很好的答案,但这里有一个 dplyr
希望能增加一些可读性.
You already have some excellent answers, but here is a dplyr
one which hopefully lends some readability.
library(dplyr)
df %>%
group_by(Name, Session) %>% # create summary data for each unique group
summarise(Page = paste0(Page, collapse = "-"))
给出
Source: local data frame [4 x 3]
Groups: Name
Name Session Page
1 Bob 1 home-news-contact_us-home
2 Bob 2 home
3 John 1 home-about_us
4 Sally 1 home-contact_us
重新阅读您的问题,似乎页面的顺序很重要,即您希望最终页面列按从左到右的顺序访问页面.因此,我们增加了一个步骤.
Rereading your question it seems like the sequence of pages is important, i.e. you would like the finally page column to have the pages visited in sequence left to right. Therefore, we include an extra step.
library(dplyr)
df %>%
group_by(Name, Session) %>% # create summary data for each unique group
arrange(Sequence) %>% # makes sure that Sequence for each group is in ascending order.
summarise(Page = paste0(Page, collapse = "-"))
这篇关于将具有公共 ID 的行压缩为一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!