在dplyr中无法使用(有时) [英] Distinct in dplyr does not work (sometimes)
问题描述
我有一个从计数中获得的以下数据帧.我已经使用 dput
使数据框可用,然后编辑了该数据框,因此存在 A
的副本.
I have the following data frame which I have obtained from a count. I have used dput
to make the data frame available and then edited the data frame so there is a duplicate of A
.
df <- structure(list(Procedure = structure(c(4L, 1L, 2L, 3L), .Label = c("A", "A", "C", "D", "-1"),
class = "factor"), n = c(10717L, 4412L, 2058L, 1480L)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), .Names = c("Procedure", "n"))
print(df)
# A tibble: 4 x 2
Procedure n
<fct> <int>
1 D 10717
2 A 4412
3 A 2058
4 C 1480
现在,我想对Procedure进行区分,只保留第一个 A
.
Now I would like to take distinct on Procedure and only keep the first A
.
df %>%
distinct(Procedure, .keep_all=TRUE)
# A tibble: 4 x 2
Procedure n
<fct> <int>
1 D 10717
2 A 4412
3 A 2058
4 C 1480
它不起作用.奇怪...
It does not work. Strange...
推荐答案
如果我们打印 Procedure
列,我们可以看到 a
有重复的级别,对于 distinct
函数是有问题的.
If we print the Procedure
column, we can see that there are duplicated levels for a
, which is problematic for the distinct
function.
df$Procedure
[1] D A A C
Levels: A A C D -1
Warning message:
In print.factor(x) : duplicated level [2] in factor
一种解决方法是降低因子水平.我们可以使用 factor
函数来实现这一点.另一种方法是将 Procedure
列转换为字符.
One way to fix is to drop the factor levels. We can use factor
function to achieve this. Another way is to convert the Procedure
column to character.
df <- structure(list(Procedure = structure(c(4L, 1L, 2L, 3L), .Label = c("A", "A", "C", "D", "-1"),
class = "factor"), n = c(10717L, 4412L, 2058L, 1480L)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), .Names = c("Procedure", "n"))
library(tidyverse)
df %>%
mutate(Procedure = factor(Procedure)) %>%
distinct(Procedure, .keep_all=TRUE)
# # A tibble: 3 x 2
# Procedure n
# <fct> <int>
# 1 D 10717
# 2 A 4412
# 3 C 1480
这篇关于在dplyr中无法使用(有时)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!