Unnest 或 unchop 包含不同长度列表的数据帧 [英] Unnest or unchop dataframe containing lists of different lengths

查看:40
本文介绍了Unnest 或 unchop 包含不同长度列表的数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含多个列的数据框,其中包含我想要unnest(或unchop)的列表列.但是,它们的长度不同,因此产生的错误是 Error: No common size for...

这是一个展示什么有效和无效的reprex.

库(tidyr)图书馆(vctrs)# 这按​​预期工作df_A <- tibble(ID = 1:3,A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9))))unchop(df_A, cols = c(A))# 小费:7 x 2身份证<int><dbl>1 1 92 1 83 1 54 2 75 2 66 3 67 3 9# 这按​​预期工作,因为列表的长度相同df_AB_1 <- tibble(ID = 1:3,A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9))),B = as_list_of(list(c(1, 2, 3), c(4, 5), c(7, 8))))unchop(df_AB_1, cols = c(A, B))# 小费:7 x 3身份证号<int><dbl><dbl>1 1 9 12 1 8 23 1 5 34 2 7 45 2 6 56 3 6 77 3 9 8# 这不起作用,因为列表的长度不同df_AB_2 <- tibble(ID = 1:3,A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9))),B = as_list_of(list(c(1, 2), c(4, 5, 6), c(7, 8, 9, 0))))unchop(df_AB_2, cols = c(A, B))# 错误:`A` 尺寸 3 和 `B` 尺寸 2 没有通用尺寸.

我想为上面的 df_AB_2 实现的输出如下,其中每个列表都没有被截断,缺失值用 NA 填充:

# tibble: 10 x 3身份证号<dbl><dbl><dbl>1 1 9 12 1 8 23 1 5 不适用4 2 7 45 2 6 56 2 不适用 67 3 6 78 3 9 89 3 不适用 910 3 不适用 0

我在 Github 和 StackOverflow 这里.

任何想法如何实现上述结果?

版本

<代码>>包版本(tidyr")[1] ‘1.0.0’>包版本(vctrs")[1] ‘0.2.0.9001’

解决方案

这是一个通过 dplyr 的想法,您可以根据需要将其概括为任意数量的列,

图书馆(tidyverse)df_AB_2%>%pivot_longer(c(A, B)) %>%mutate(value = lapply(value, `length<-`, max(lengths(value)))) %>%pivot_wider(names_from = name, values_from = value) %>%unnest() %>%过滤器(rowSums(is.na(.[-1])) != 2)

给出,

<块引用>

# tibble: 10 x 3身份证号<int><dbl><dbl>1 1 9 12 1 8 23 1 5 不适用4 2 7 45 2 6 56 2 不适用 67 3 6 78 3 9 89 3 不适用 910 3 不适用 0

I have a dataframe with several columns containing list columns that I want to unnest (or unchop). BUT, they are different lengths, so the resulting error is Error: No common size for...

Here is a reprex to show what works and doesn't work.

library(tidyr)
library(vctrs)

# This works as expected
df_A <- tibble(
  ID = 1:3,
  A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9)))
)

unchop(df_A, cols = c(A))
# A tibble: 7 x 2
     ID     A
  <int> <dbl>
1     1     9
2     1     8
3     1     5
4     2     7
5     2     6
6     3     6
7     3     9

# This works as expected as the lists are the same lengths

df_AB_1 <- tibble(
  ID = 1:3,
  A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9))),
  B = as_list_of(list(c(1, 2, 3), c(4, 5), c(7, 8)))
)

unchop(df_AB_1, cols = c(A, B))

# A tibble: 7 x 3
     ID     A     B
  <int> <dbl> <dbl>
1     1     9     1
2     1     8     2
3     1     5     3
4     2     7     4
5     2     6     5
6     3     6     7
7     3     9     8

# This does NOT work as the lists are different lengths

df_AB_2 <- tibble(
  ID = 1:3,
  A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9))),
  B = as_list_of(list(c(1, 2), c(4, 5, 6), c(7, 8, 9, 0)))
)

unchop(df_AB_2, cols = c(A, B))

# Error: No common size for `A`, size 3, and `B`, size 2.

The output that I would like to achieve for df_AB_2 above is as follows where each list is unchopped and missing values are filled with NA:

# A tibble: 10 x 3
      ID     A     B
   <dbl> <dbl> <dbl>
 1     1     9     1
 2     1     8     2
 3     1     5    NA
 4     2     7     4
 5     2     6     5
 6     2    NA     6
 7     3     6     7
 8     3     9     8
 9     3    NA     9
10     3    NA     0

I have referenced this issue on Github and StackOverflow here.

Any ideas how to achieve the result above?

Versions

> packageVersion("tidyr")
[1] ‘1.0.0’
> packageVersion("vctrs")
[1] ‘0.2.0.9001’

解决方案

Here is an idea via dplyr that you can generalise to as many columns as you want,

library(tidyverse)

df_AB_2 %>% 
 pivot_longer(c(A, B)) %>% 
 mutate(value = lapply(value, `length<-`, max(lengths(value)))) %>% 
 pivot_wider(names_from = name, values_from = value) %>% 
 unnest() %>% 
 filter(rowSums(is.na(.[-1])) != 2)

which gives,

# A tibble: 10 x 3
      ID     A     B
   <int> <dbl> <dbl>
 1     1     9     1
 2     1     8     2
 3     1     5    NA
 4     2     7     4
 5     2     6     5
 6     2    NA     6
 7     3     6     7
 8     3     9     8
 9     3    NA     9
10     3    NA     0

这篇关于Unnest 或 unchop 包含不同长度列表的数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆