如何使用 tidyr::unite 函数删除 NA? [英] How do I remove NAs with the tidyr::unite function?
问题描述
在将几列与 tidyr::unite()
合并后,丢失数据中的 NA 保留在我不想要的字符向量中.
After combining several columns with tidyr::unite()
, NAs from missing data remain in my character vector, which I do not want.
我每行有一系列医疗诊断(每列 1 个),并希望对一系列代码进行基准搜索via. %in%
和 grepl()
.
I have a series of medical diagnoses per row (1 per column) and would like to benchmark searching for a series of codes via. %in%
and grepl()
.
Github 上有一个关于这个问题的公开问题,有没有动静- 或变通?我想保持矢量逗号分隔.
There is an open issue on Github on this problem, is there any movement - or work arounds? I would like to keep the vector comma-separated.
这是一个代表性的例子:
Here is a representative example:
library(dplyr)
library(tidyr)
df <- data_frame(a = paste0("A.", rep(1, 3)), b = " ", c = c("C.1", "C.3", " "), d = "D.4", e = "E.5")
cols <- letters[2:4]
df[, cols] <- gsub(" ", NA_character_, as.matrix(df[, cols]))
tidyr::unite(df, new, cols, sep = ",")
当前输出:
# # A tibble: 3 x 3
# a new e
# <chr> <chr> <chr>
# 1 A.1 NA,C.1,D.4 E.5
# 2 A.1 NA,C.3,D.4 E.5
# 3 A.1 NA,NA,D.4 E.5
所需的输出:
# # A tibble: 3 x 3
# a new e
# <chr> <chr> <chr>
# 1 A.1 C.1,D.4 E.5
# 2 A.1 C.3,D.4 E.5
# 3 A.1 D.4 E.5
推荐答案
您可以在创建 NA 后使用正则表达式删除它们:
You could use regex to remove the NAs after they are created:
library(dplyr)
library(tidyr)
df <- data_frame(a = paste0("A.", rep(1, 3)),
b = " ",
c = c("C.1", "C.3", " "),
d = "D.4", e = "E.5")
cols <- letters[2:4]
df[, cols] <- gsub(" ", NA_character_, as.matrix(df[, cols]))
tidyr::unite(df, new, cols, sep = ",") %>%
dplyr::mutate(new = stringr::str_replace_all(new, 'NA,?', '')) # New line
输出:
# A tibble: 3 x 3
a new e
<chr> <chr> <chr>
1 A.1 C.1,D.4 E.5
2 A.1 C.3,D.4 E.5
3 A.1 D.4 E.5
这篇关于如何使用 tidyr::unite 函数删除 NA?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!