Unnest 或 unchop 包含不同长度列表的数据帧 [英] Unnest or unchop dataframe containing lists of different lengths
问题描述
我有一个包含多个列的数据框,其中包含我想要unnest
(或unchop
)的列表列.但是,它们的长度不同,因此产生的错误是 Error: No common size for...
这是一个展示什么有效和无效的reprex.
库(tidyr)图书馆(vctrs)# 这按预期工作df_A <- tibble(ID = 1:3,A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9))))unchop(df_A, cols = c(A))# 小费:7 x 2身份证<int><dbl>1 1 92 1 83 1 54 2 75 2 66 3 67 3 9# 这按预期工作,因为列表的长度相同df_AB_1 <- tibble(ID = 1:3,A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9))),B = as_list_of(list(c(1, 2, 3), c(4, 5), c(7, 8))))unchop(df_AB_1, cols = c(A, B))# 小费:7 x 3身份证号<int><dbl><dbl>1 1 9 12 1 8 23 1 5 34 2 7 45 2 6 56 3 6 77 3 9 8# 这不起作用,因为列表的长度不同df_AB_2 <- tibble(ID = 1:3,A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9))),B = as_list_of(list(c(1, 2), c(4, 5, 6), c(7, 8, 9, 0))))unchop(df_AB_2, cols = c(A, B))# 错误:`A` 尺寸 3 和 `B` 尺寸 2 没有通用尺寸.
我想为上面的 df_AB_2
实现的输出如下,其中每个列表都没有被截断,缺失值用 NA 填充:
# tibble: 10 x 3身份证号<dbl><dbl><dbl>1 1 9 12 1 8 23 1 5 不适用4 2 7 45 2 6 56 2 不适用 67 3 6 78 3 9 89 3 不适用 910 3 不适用 0
我在 Github 和 StackOverflow 这里.
任何想法如何实现上述结果?
版本
<代码>>包版本(tidyr")[1] ‘1.0.0’>包版本(vctrs")[1] ‘0.2.0.9001’
这是一个通过 dplyr 的想法,您可以根据需要将其概括为任意数量的列,
图书馆(tidyverse)df_AB_2%>%pivot_longer(c(A, B)) %>%mutate(value = lapply(value, `length<-`, max(lengths(value)))) %>%pivot_wider(names_from = name, values_from = value) %>%unnest() %>%过滤器(rowSums(is.na(.[-1])) != 2)
给出,
<块引用># tibble: 10 x 3身份证号<int><dbl><dbl>1 1 9 12 1 8 23 1 5 不适用4 2 7 45 2 6 56 2 不适用 67 3 6 78 3 9 89 3 不适用 910 3 不适用 0
I have a dataframe with several columns containing list columns that I want to unnest
(or unchop
). BUT, they are different lengths, so the resulting error is Error: No common size for...
Here is a reprex to show what works and doesn't work.
library(tidyr)
library(vctrs)
# This works as expected
df_A <- tibble(
ID = 1:3,
A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9)))
)
unchop(df_A, cols = c(A))
# A tibble: 7 x 2
ID A
<int> <dbl>
1 1 9
2 1 8
3 1 5
4 2 7
5 2 6
6 3 6
7 3 9
# This works as expected as the lists are the same lengths
df_AB_1 <- tibble(
ID = 1:3,
A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9))),
B = as_list_of(list(c(1, 2, 3), c(4, 5), c(7, 8)))
)
unchop(df_AB_1, cols = c(A, B))
# A tibble: 7 x 3
ID A B
<int> <dbl> <dbl>
1 1 9 1
2 1 8 2
3 1 5 3
4 2 7 4
5 2 6 5
6 3 6 7
7 3 9 8
# This does NOT work as the lists are different lengths
df_AB_2 <- tibble(
ID = 1:3,
A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9))),
B = as_list_of(list(c(1, 2), c(4, 5, 6), c(7, 8, 9, 0)))
)
unchop(df_AB_2, cols = c(A, B))
# Error: No common size for `A`, size 3, and `B`, size 2.
The output that I would like to achieve for df_AB_2
above is as follows where each list is unchopped and missing values are filled with NA:
# A tibble: 10 x 3
ID A B
<dbl> <dbl> <dbl>
1 1 9 1
2 1 8 2
3 1 5 NA
4 2 7 4
5 2 6 5
6 2 NA 6
7 3 6 7
8 3 9 8
9 3 NA 9
10 3 NA 0
I have referenced this issue on Github and StackOverflow here.
Any ideas how to achieve the result above?
Versions
> packageVersion("tidyr")
[1] ‘1.0.0’
> packageVersion("vctrs")
[1] ‘0.2.0.9001’
Here is an idea via dplyr that you can generalise to as many columns as you want,
library(tidyverse)
df_AB_2 %>%
pivot_longer(c(A, B)) %>%
mutate(value = lapply(value, `length<-`, max(lengths(value)))) %>%
pivot_wider(names_from = name, values_from = value) %>%
unnest() %>%
filter(rowSums(is.na(.[-1])) != 2)
which gives,
# A tibble: 10 x 3 ID A B <int> <dbl> <dbl> 1 1 9 1 2 1 8 2 3 1 5 NA 4 2 7 4 5 2 6 5 6 2 NA 6 7 3 6 7 8 3 9 8 9 3 NA 9 10 3 NA 0
这篇关于Unnest 或 unchop 包含不同长度列表的数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!