如何传播具有重复标识符的列? [英] How to spread columns with duplicate identifiers?

查看:39
本文介绍了如何传播具有重复标识符的列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

A 有以下 tibble:

structure(list(age = c("21", "17", "32", "29", "15"),性别 = 结构(c(2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor")),row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("age", "gender"))年龄 性别<chr><fctr>1 21 男2 17 女3 32 女4 29 男5 15 男

我正在尝试使用 tidyr::spread 来实现这一点:

 女 男1 不适用 212 17 不适用3 32 不适用4 不适用 295 不适用 15

我认为 spread(gender, age) 会起作用,但我收到一条错误消息:

<块引用>

错误:行 (2, 3), (1, 4, 5) 的标识符重复

解决方案

现在,Female 有两个 age 值,Male 有三个值>,并且没有其他变量阻止它们折叠成一行,因为 spread 试图处理具有相似/无索引值的值:

library(tidyverse)df <- data_frame(x = c('a', 'b'), y = 1:2)df # 2 行...#># 小块:2 x 2#>xy#><chr><int>#>1 一个 1#>2 b 2df %>% spread(x, y) # ...如果每个值只有一个,则变为一.#># 小费:1 x 2#>乙#>* <int><int>#>1 1 2

spread 不应用函数来组合多个值(à la dcast),因此必须对行进行索引,以便某个位置有一个或零个值,例如

df <- data_frame(i = c(1, 1, 2, 2, 3, 3),x = c('a', 'b', 'a', 'b', 'a', 'b'),y = 1:6)df # 两行每个 `i` 值...#># 小费:6 x 3#>我是#><dbl><chr><int>#>1 1 1#>2 1 b 2#>3 2 一个 3#>4 2 2 4#>5 3 一个 5#>6 3 乙 6df %>% spread(x, y) # ...在这里变成一行.#># 小费:3 x 3#>我一个#>* <dbl><int><int>#>1 1 1 2#>2 2 3 4#>3 3 5 6

如果您的值没有被其他列自然索引,您可以添加一个唯一的索引列(例如,通过将行号添加为列),这将阻止 spread 试图折叠行:

df <- structure(list(age = c("21", "17", "32", "29", "15"),性别=结构(c(2L,1L,1L,2L,2L),.Label = c("Female", "Male"), class = "factor")),row.names = c(NA, -5L),class = c("tbl_df", "tbl", "data.frame"),.Names = c("年龄", "性别"))df %>% mutate(i = row_number()) %>% 传播(性别,年龄)#># 小块:5 x 3#>i 女 男#>* <int><chr><chr>#>1 1 <NA>21#>2 2 17 <NA>#>3 3 32 <NA>#>4 4 <NA>29#>5 5 <NA>15

如果您想在之后删除它,请添加select(-i).在这种情况下,这不会产生非常有用的 data.frame,但在更复杂的重塑过程中可能非常有用.

A have the following tibble:

structure(list(age = c("21", "17", "32", "29", "15"), 
               gender = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor")), 
          row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("age", "gender"))

    age gender
  <chr> <fctr>
1    21   Male
2    17 Female
3    32 Female
4    29   Male
5    15   Male

And I am trying to use tidyr::spread to achieve this:

  Female Male
1    NA     21
2    17     NA
3    32     NA
4    NA     29
5    NA     15

I thought spread(gender, age) would work, but I get an error message saying:

Error: Duplicate identifiers for rows (2, 3), (1, 4, 5)

解决方案

Right now you have two age values for Female and three for Male, and no other variables keeping them from being collapsed into a single row, as spread tries to do with values with similar/no index values:

library(tidyverse)

df <- data_frame(x = c('a', 'b'), y = 1:2)

df    # 2 rows...
#> # A tibble: 2 x 2
#>       x     y
#>   <chr> <int>
#> 1     a     1
#> 2     b     2

df %>% spread(x, y)    # ...become one if there's only one value for each.
#> # A tibble: 1 x 2
#>       a     b
#> * <int> <int>
#> 1     1     2

spread doesn't apply a function to combine multiple values (à la dcast), so rows must be indexed so there's one or zero values for a location, e.g.

df <- data_frame(i = c(1, 1, 2, 2, 3, 3), 
                 x = c('a', 'b', 'a', 'b', 'a', 'b'), 
                 y = 1:6)

df    # the two rows with each `i` value here...
#> # A tibble: 6 x 3
#>       i     x     y
#>   <dbl> <chr> <int>
#> 1     1     a     1
#> 2     1     b     2
#> 3     2     a     3
#> 4     2     b     4
#> 5     3     a     5
#> 6     3     b     6

df %>% spread(x, y)    # ...become one row here.
#> # A tibble: 3 x 3
#>       i     a     b
#> * <dbl> <int> <int>
#> 1     1     1     2
#> 2     2     3     4
#> 3     3     5     6

If you your values aren't indexed naturally by the other columns you can add a unique index column (e.g. by adding the row numbers as a column) which will stop spread from trying to collapse the rows:

df <- structure(list(age = c("21", "17", "32", "29", "15"), 
                     gender = structure(c(2L, 1L, 1L, 2L, 2L), 
                                        .Label = c("Female", "Male"), class = "factor")), 
                row.names = c(NA, -5L), 
                class = c("tbl_df", "tbl", "data.frame"), 
                .Names = c("age", "gender"))

df %>% mutate(i = row_number()) %>% spread(gender, age)
#> # A tibble: 5 x 3
#>       i Female  Male
#> * <int>  <chr> <chr>
#> 1     1   <NA>    21
#> 2     2     17  <NA>
#> 3     3     32  <NA>
#> 4     4   <NA>    29
#> 5     5   <NA>    15

If you want to remove it afterwards, add on select(-i). This doesn't produce a terribly useful data.frame in this case, but can be very useful in the midst of more complicated reshaping.

这篇关于如何传播具有重复标识符的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆