dplyr n_distinct与条件 [英] dplyr n_distinct with condition
问题描述
使用dplyr来总结一个数据集,我想调用n_distinct来计算列中唯一出现的次数。但是,我也想对另一列中的条件满足的列中的所有唯一的出现进行另一个summarize()。
名为a的示例数据框: / p>
AB
1 Y
2 N
3 Y
1 Y
a%>%summaryize(count = n_distinct(A))
但是,我还想添加一个 n_distinct(A)
的计数,其中 B ==Y
结果应该是:
count
3
当您添加条件结果应该是:
count
2
我试图实现的最终结果是两个语句合并成一个调用,给我一个结果,如
count_all count_BisY
3 2
什么是适当的方式用dplyr来解决这个问题?
uniqueN
函数: 库(dplyr)
库(data.table)
a%>%summarize(count_all = n_distinct(A),count_BisY = uniqueN(A [B =='Y']))
它给出:
count_all count_BisY
1 3 2
您还可以使用 data.table :
library(data.table)
setDT(a)[,。(count_all = uniqueN(A),count_BisY = uniqueN(A [B = ='Y'])]]
其结果相同。
Using dplyr to summarise a dataset, I want to call n_distinct to count the number of unique occurrences in a column. However, I also want to do another summarise() for all unique occurrences in a column where a condition in another column is satisfied.
Example dataframe named "a":
A B
1 Y
2 N
3 Y
1 Y
a %>% summarise(count = n_distinct(A))
However I also want to add a count of n_distinct(A)
where B == "Y"
The result should be:
count
3
when you add the condition the result should be:
count
2
The end result I am trying to achieve is both statements merged into one call that gives me a result like
count_all count_BisY
3 2
What is the appropriate way to go about this with dplyr?
An alternative is to use the uniqueN
function from data.table inside dplyr:
library(dplyr)
library(data.table)
a %>% summarise(count_all = n_distinct(A), count_BisY = uniqueN(A[B == 'Y']))
which gives:
count_all count_BisY
1 3 2
You can also do everything with data.table:
library(data.table)
setDT(a)[, .(count_all = uniqueN(A), count_BisY = uniqueN(A[B == 'Y']))]
which gives the same result.
这篇关于dplyr n_distinct与条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!