tidyr:具有不同 NA 计数的多次取消嵌套 [英] tidyr: multiple unnesting with varying NA counts

查看:21
本文介绍了tidyr:具有不同 NA 计数的多次取消嵌套的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对一些整洁的行为感到困惑.我可以像这样取消嵌套一个响应:

I'm confused about some tidyr behavior. I can unnest a single response like this:

library(tidyr)

resp1 <- c("A", "B; A", "B", NA, "B")
resp2 <- c("C; D; F", NA, "C; F", "D", "E")
resp3 <- c(NA, NA, "G; H; I", "H; I", "I")
data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F)

tidy <- data %>%
  transform(resp1 = strsplit(resp1, "; ")) %>%
  unnest()

# Source: local data frame [6 x 3]
#
#      resp2   resp3 resp1
#      (chr)   (chr) (chr)
# 1 C; D; F      NA     A
# 2      NA      NA     B
# 3      NA      NA     A
# 4    C; F G; H; I     B
# 5       D    H; I    NA
# 6       E       I     B

但我需要在我的数据集中取消嵌套多个列,并且这些列具有不同数量的 NA.我试过了,它抛出了一个错误:

But I need to unnest multiple columns in my dataset, and the columns have varying numbers of NAs. I tried this and it threw an error:

data %>%
  transform(resp1 = strsplit(resp1, "; "),
            resp2 = strsplit(resp2, "; "),
            resp3 = strsplit(resp3, "; ")) %>%
  unnest()
# Error: All nested columns must have the same number of elements.

我希望上面的代码会给我与以下相同的输出:

I expected the code above would give me the same output as the following:

# unnesting multiple response (desired output / is there a better way?)
data %>%
  transform(resp1 = strsplit(resp1, "; ")) %>%
  unnest() %>%
  transform(resp2 = strsplit(resp2, "; ")) %>%
  unnest() %>%
  transform(resp3 = strsplit(resp3, "; ")) %>%
  unnest()

#     resp1 resp2 resp3
#     (chr) (chr) (chr)
# 1      A     C    NA
# 2      A     D    NA
# 3      A     F    NA
# 4      B    NA    NA
# 5      A    NA    NA
# 6      B     C     G
# 7      B     C     H
# 8      B     C     I
# 9      B     F     G
# 10     B     F     H
# 11     B     F     I
# 12    NA     D     H
# 13    NA     D     I
# 14     B     E     I

我是 R 的新手,但这感觉很笨拙,让我怀疑我是否在滥用我不应该滥用的东西.多次 unnest 尝试失败是怎么回事?

I'm new to R, but this feels clunky and makes me wonder if I'm abusing something I shouldn't be abusing. What's going on with failed multiple unnest attempt?

推荐答案

检查 此链接,这显示了从您的列中取消嵌套多个列的不同情况.根据文档和给出的链接,除非有一些聪明的方法来做到这一点,否则可能只为单个列定义函数以避免歧义.

Check this link, which shows a different situation of unnesting multiple columns from yours. According to the documentation and the link given, unless there is some clever way to do this, the function might be just defined for a single column to avoid the ambiguity.

因此,您可能需要将列一一取消嵌套,下面给出的代码可能仍然很麻烦,但稍微简化了一些.

So you may have to unnest your columns one by one, and the code given below might be still cumbersome but simplifies a little bit.

> resp1 <- c("A", "B; A", "B", NA, "B")
> resp2 <- c("C; D; F", NA, "C; F", "D", "E")
> resp3 <- c(NA, NA, "G; H; I", "H; I", "I")
> data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F)
> data
  resp1   resp2   resp3
1     A C; D; F    <NA>
2  B; A    <NA>    <NA>
3     B    C; F G; H; I
4  <NA>       D    H; I
5     B       E       I
library(tidyr)
library(dplyr)
data %>%
transform(resp1 = strsplit(resp1, "; "),
          resp2 = strsplit(resp2, "; "),
          resp3 = strsplit(resp3, "; ")) %>%
unnest(resp1) %>% unnest(resp2) %>% unnest(resp3)
   resp1 resp2 resp3
1      A     C  <NA>
2      A     D  <NA>
3      A     F  <NA>
4      B  <NA>  <NA>
5      A  <NA>  <NA>
6      B     C     G
7      B     C     H
8      B     C     I
9      B     F     G
10     B     F     H
11     B     F     I
12  <NA>     D     H
13  <NA>     D     I
14     B     E     I

这篇关于tidyr:具有不同 NA 计数的多次取消嵌套的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆