使用重复的标识符传播(使用 tidyverse 和 %>%) [英] Spread with duplicate identifiers (using tidyverse and %>%)

查看:22
本文介绍了使用重复的标识符传播(使用 tidyverse 和 %>%)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在 tidyverse 中使用 %>%-chaining 来做到这一点.

I would like to do this in tidyverse using %>%-chaining.

df <- 
structure(list(id = c(2L, 2L, 4L, 5L, 5L, 5L, 5L), start_end = structure(c(2L, 
1L, 2L, 2L, 1L, 2L, 1L), .Label = c("end", "start"), class = "factor"), 
    date = structure(c(6L, 7L, 3L, 8L, 9L, 10L, 11L), .Label = c("1979-01-03", 
    "1979-06-21", "1979-07-18", "1989-09-12", "1991-01-04", "1994-05-01", 
    "1996-11-04", "2005-02-01", "2009-09-17", "2010-10-01", "2012-10-06"
    ), class = "factor")), .Names = c("id", "start_end", "date"
), row.names = c(3L, 4L, 7L, 8L, 9L, 10L, 11L), class = "data.frame")

我尝试过的:

data.table::dcast( df, formula = id ~ start_end, value.var = "date", drop = FALSE )  # does not work because it summarises the data

tidyr::spread( df, start_end, date )  # does not work because of duplicate values


df$id2 <- 1:nrow(df)
tidyr::spread( df, start_end, date ) # does not work because the dataset now has too many rows.

这些问题没有回答我的问题:

These questions do not answer my question:

对行使用带有重复标识符的传播(因为它们总结了)

R:重复数据帧上的传播函数(因为它们将值粘贴在一起)

R: spread function on data frame with duplicates (because they paste the values together)

使用登录"在 R 中重塑数据退出"次(因为没有使用 tidyverse 和链接专门询问/回答)

Reshaping data in R with "login" "logout" times (because not specifically asking for/answered using tidyverse and chaining)

推荐答案

我们可以使用tidyverse.按'start_end'、'id'分组后,创建一个序列列'ind',然后spread从'long'到'wide'格式

We can use tidyverse. After grouping by 'start_end', 'id', create a sequence column 'ind' , then spread from 'long' to 'wide' format

library(dplyr)
library(tidyr)
df %>%
   group_by(start_end, id) %>%
   mutate(ind = row_number()) %>%
   spread(start_end, date) %>% 
   select(start, end)
#     id      start        end
#* <int>     <fctr>     <fctr>
#1     2 1994-05-01 1996-11-04
#2     4 1979-07-18         NA
#3     5 2005-02-01 2009-09-17
#4     5 2010-10-01 2012-10-06

<小时>

或者使用tidyr_1.0.0

chop(df, date) %>%
     spread(start_end, date) %>%
     unnest(c(start, end))

这篇关于使用重复的标识符传播(使用 tidyverse 和 %>%)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆