将具有公共 ID 的行压缩为一行 [英] Compress rows with common id's to one row

查看:27
本文介绍了将具有公共 ID 的行压缩为一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,我还没有找到答案.有类似的问题,其解决方案在我的情况下不太适用.我有一个包含四列的数据集,如下例所示:

I have a question that I have not found an answer for. There are similar questions whose solutions don't quite work in my situation. I have a data set that has four columns like this example:

Name   Session   Sequence   Page
Bob     001       001       home
Bob     001       002       news
Bob     001       003       contact_us
Bob     001       004       home
Sally   001       001       home
Sally   001       002       contact_us
Bob     002       001       home
John    001       001       home
John    001       002       about_us

我想要的是这样的

Name    Session   Pages
Bob     001       home-news-contact_us-home
Sally   001       home-contact_us
Bob     002       home
John    001       home-about-us

现在的诀窍是序列可以从 1:44 开始,或者介于两者之间.我正在用 R 编码并且有可用的 SQLite.我还需要连接破折号,但这很容易.如果 R 在 SAS 中有诸如滞后"之类的东西,这将很容易.

Now the trick is that Sequence can be from 1:44, or anywhere in between. I am coding in R and have SQLite available. I also need to concatenate in the dashes, but that is easy. If R had something like 'lag' in SAS this would be a snap.

推荐答案

你已经有了一些很好的答案,但这里有一个 dplyr 希望能增加一些可读性.

You already have some excellent answers, but here is a dplyr one which hopefully lends some readability.

library(dplyr)

df %>%
    group_by(Name, Session) %>% # create summary data for each unique group
    summarise(Page = paste0(Page, collapse = "-")) 

给出

Source: local data frame [4 x 3]
Groups: Name

   Name Session                      Page
1   Bob       1 home-news-contact_us-home
2   Bob       2                      home
3  John       1             home-about_us
4 Sally       1           home-contact_us

重新阅读您的问题,似乎页面的顺序很重要,即您希望最终页面列按从左到右的顺序访问页面.因此,我们增加了一个步骤.

Rereading your question it seems like the sequence of pages is important, i.e. you would like the finally page column to have the pages visited in sequence left to right. Therefore, we include an extra step.

library(dplyr)

df %>%
    group_by(Name, Session) %>% # create summary data for each unique group
    arrange(Sequence) %>% # makes sure that Sequence for each group is in ascending order.
    summarise(Page = paste0(Page, collapse = "-")) 

这篇关于将具有公共 ID 的行压缩为一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆