为什么我的ggplot2中的堆积面积图为空 [英] Why is my stacked area graph in ggplot2 empty

查看:96
本文介绍了为什么我的ggplot2中的堆积面积图为空的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用以下命令在r中生成堆积面积图:

I am trying to generate a stacked area graph in r using below command:

ggplot(p_ash_r_100,aes(x=SMPL_TIME,y=SMPL_CNT,col=EVENT,group=1))+ geom_area()

这里EVENT是我想根据ORACLE DB中的时间和样本计数绘制的第三个变量.

Here EVENT is the 3rd variable which I want to chart out based on time and sample counts in the ORACLE DB.

但是带有上述命令的图形将返回空.

But the graph with above commands is returning empty.

我的问题是:

  1. 如何解决空图问题.

  1. How to fix the empty graph problem.

如何在显示或更早的显示时仅根据数据量过滤10个顶部变量?正如我在图像文件中所示,我们可以在excel中轻松做到这一点.

How to filter only the 10 top variables based on the amount of data when displaying or earlier? We can do it easily in excel as I show here in the image file.

我的数据集如下:

> p_ash_r_100
          SMPL_TIME        SQL_ID                        MODULE                        EVENT SMPL_CNT
1   11-APR-17 09:00 03d5x9busf1d8                      SQL*Plus                          CPU        1
2   11-APR-17 09:00 2pb7bzzadj0pn OGG-RCASI004-OPEN_DATA_SOURCE      db file sequential read        1
3   11-APR-17 09:00        NO_SQL                    GoldenGate                          CPU        1
4   11-APR-17 09:00        NO_SQL                    MMON_SLAVE                          CPU        1
5   11-APR-17 09:00        NO_SQL                        NO_SQL              Log archive I/O        1
6   11-APR-17 09:00        NO_SQL                       XStream                          CPU        1
7   11-APR-17 09:00 acuzxh557cq81                    GoldenGate      db file sequential read        1
8   11-APR-17 09:00 cqtby4bsrmxzh                    GoldenGate                          CPU        1
9   11-APR-17 09:00 dgzp3at57cagd                    GoldenGate      db file sequential read        2
10  11-APR-17 09:00 fjp9t92a5yx1v                    GoldenGate      db file sequential read        1
11  11-APR-17 09:00 guh1sva39p9db                    GoldenGate      db file sequential read        1
12  11-APR-17 09:01 0hz0dhgwk12cd                    GoldenGate            direct path write        1
13  11-APR-17 09:01 2jafq5d4n0akv                    GoldenGate                          CPU        1
14  11-APR-17 09:01 37cspa0acgqxp                    GoldenGate      db file sequential read        2
15  11-APR-17 09:01 79rugrngrvpt1 OGG-RADDR025-OPEN_DATA_SOURCE      db file sequential read        1
16  11-APR-17 09:01 7k6zp92kbv28m                    GoldenGate                          CPU        1
17  11-APR-17 09:01 7nvtkfc0bt8vv                    GoldenGate      db file sequential read        1
18  11-APR-17 09:01 7pvpzvd1g769d                    GoldenGate                          CPU        1
19  11-APR-17 09:01 9gduk46rmt5jy                    GoldenGate      db file sequential read        1
20  11-APR-17 09:01        NO_SQL                    GoldenGate                          CPU 

   7

在下面添加数据集的图像以便于理解

Adding image of the dataset below for ease of understanding

我想从excel =>那里得到这样的最终图

The end graph which I want to get it something like this one from excel=>

excel中的值过滤器可获取excel中排名前10位的事件=>

Value filters in excel to get Top 10 events in excel =>

推荐答案

我将从第二个问题开始,这很容易.使用 dplyr 包,可以使用 top_n 获取给定列的n个最大行.例如:

I'll start with the second question, which is easier. Using the dplyr package, you can use top_n to get the n largest rows for a given column. For example:

> top_n(p_ash_r_100a, 3, SMPL_CNT) %>% arrange(desc(SMPL_CNT))
# A tibble: 3 × 5
            SMPL_TIME        SQL_ID     MODULE                   EVENT SMPL_CNT
               <dttm>         <chr>      <chr>                   <chr>    <int>
1 2017-04-11 09:01:00        NO_SQL GoldenGate                     CPU        7
2 2017-04-11 09:00:00 dgzp3at57cagd GoldenGate db file sequential read        2
3 2017-04-11 09:01:00 37cspa0acgqxp GoldenGate db file sequential read        2

请注意,如果并列第n位,您将获得n行以上.因此,由于17位并列第4位, top_n(p_ash_r_100,10,SMPL_CNT)将返回整个样本数据集.

Note that you will get more than n rows if there are ties for nth place. Thus top_n(p_ash_r_100, 10, SMPL_CNT) will return the entire sample data set because of the 17-way tie for 4th.

关于第一个问题, geom_area 的文档提供了一个线索:

As for the first question, the documentation for geom_area provides a clue:

面积图是堆积条形图的连续模拟(请参见geom_bar),并且可以用来显示整体的组成如何变化在x的范围内.

An area plot is the continuous analog of a stacked bar chart (see geom_bar), and can be used to show how composition of the whole varies over the range of x.

这表明 geom_area 希望映射到x的列应该是数字.根据 p_ash_r_100 的列表, SMPL_TIME 似乎是一个字符向量.使用 lubridate 包,我们可以使用 dmy_hm 将 SMPL_TIME 转换为日期时间:

This suggests that geom_area expects the column mapped to x should be numeric. Based on the listing for p_ash_r_100, SMPL_TIME appears to be a character vector. With the lubridate package, we can convert SMPL_TIME to a date-time with dmy_hm:

p_ash_r_100a <- p_ash_r_100 %>%
  mutate_at(vars(SMPL_TIME), dmy_hm)

但是,这不足以获取所需的绘图,因为 x fill y 值>(这是 geom_area 的正确美学,而不是" col ").在绘制之前,我们需要对数据进行汇总:

However, this isn't enough to get the plot you want since there are multiple values of y for each combination of x and fill (which is the correct aesthetic for geom_area, not "col"). We need to summarise the data before plotting:

p_ash_r_100a %>%
  group_by(SMPL_TIME, EVENT) %>%
  summarise(total = sum(SMPL_CNT)) %>%
  ggplot(aes(SMPL_TIME, total, fill = EVENT)) +
  geom_area()

但是情节仍然不正确.这是因为 SMPL_TIME EVENT 的每种组合未在数据集中表示.我们需要明确告诉 geom_area ,对于那些丢失的行, y 等于零.一种方法是在 tidyr :: spread 中使用方便的 fill 参数.

Yet the plot is still not correct. This is because every combination of SMPL_TIME and EVENT is not represented in the data set. We need to explicitly tell geom_area that y is equal to zero for those missing rows. One way is to use the handy fill argument in tidyr::spread.

group_by(p_ash_r_100a, SMPL_TIME, EVENT) %>%
  summarise(smpl_sum = sum(SMPL_CNT)) %>%
  spread(EVENT, smpl_sum, fill = 0) %>% 
  gather(EVENT, smpl_sum, CPU, `db file sequential read`, 
         `direct path write`,
         `Log archive I/O`) %>%
  ggplot(aes(x = SMPL_TIME, y = smpl_sum, fill = EVENT)) +
  geom_area()

这篇关于为什么我的ggplot2中的堆积面积图为空的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆