使用dplyr选择并绘制最高频率 [英] Select and plot top frequencies with dplyr
问题描述
目标是选择/过滤数据框中频率最高(发生次数最多)的前3个(或n个)事件,然后使用ggplot2中的小节图绘制这些事件.
The objective is to select/filter top 3 (or n) events that have the largest frequencies (occurrences) in a dataframe then plot these using a barplot in ggplot2.
示例:
library(dplyr)
df <- data.frame(
type=c("car","bike","horse","boat","yacht","train"),freq=c(20,2,5,60,11,10))
到目前为止,我可以安排 df
:
So far, I could arrange df
:
df_order <- df %>%
arrange(desc(freq))
[1] df_order
type freq
1 boat 60
2 car 20
3 yacht 11
4 train 10
5 horse 5
6 bike 2
理想的结果是仅选择前3种类型
,然后使用小节图绘制这些类型.我认为 count
将很有用,但不确定如何做到这一点.有什么想法吗?
The desired result is to select only the top 3 types
then plot these using a barplot. I think count
will be useful, but not sure how to do that. Any ideas?
推荐答案
根据"freq"列( arrange(...)
)对数据集进行排序后,我们可以排在前3位使用 slice
的值,使用 ggplot
,在 aes
中指定'x'和'y'变量,并使用geom_bar
After we order the dataset based on the 'freq' column (arrange(...)
), we can the top 3 values with slice
, use ggplot
, specify the 'x' and 'y' variables in the aes
, and plot the bar with geom_bar
library(ggplot2)
library(dplyr)
df %>%
arrange(desc(freq)) %>%
slice(1:3) %>%
ggplot(., aes(x=type, y=freq))+
geom_bar(stat='identity')
或者另一个选择是 top_n
,这是一个方便的包装器,它使用 filter
和 min_rank
选择顶部的'n'(3)个观测值在频率"列中,然后如上所述使用 ggplot
.
Or another option is top_n
which is a convenient wrapper that uses filter
and min_rank
to select the top 'n' (3) observations in 'freq' column and use ggplot
as above.
top_n(df, n=3, freq) %>%
ggplot(., aes(x=type, y=freq))+
geom_bar(stat='identity')
这篇关于使用dplyr选择并绘制最高频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!