使用R ggplot绘制宽格式数据 [英] Plotting wide format data using R ggplot

查看：138 发布时间：2020/10/16 21:16:11 r dataframe ggplot2

本文介绍了使用R ggplot绘制宽格式数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框（如下所示），按地区显示了年销售额。最后一栏计算三年内该地区所有销售额的总和。

我是R的新手，想使用 ggplot 创建一个单散点图来分析数据。 x轴为三年，y轴为销售额。

理想情况下，每个区域在2013年，2014年，2015年和2016年都有自己的点线（除了少数NA）。然后，我要为每个区域着色线根据其区域。总和列不应出现在绘图上。有想法吗？

  df<-structure（list（Region = structure（1：6，
 .Label = c（ A， B， C， D， E， F， G， H， I， J，
 K，  L， M， N， O， P， Q， R， S， T， U），
 class = factor ），
 2016 = c（8758.82，25559.89，30848.02，8696.99，3621.12，5468.76），
 2015 = c（26521.67，89544.93，92825.55，28916.4，14004.54，16618.38），
 2014 = c（NA，NA，199673.73，37108.09，16909.87，20610.58），
 2013 = c（27605.35，NA，78794.31，31824.75，17990.21，17307.11），
总计销售 = c（35280.49、115104.82、323347.3、74721.48、34535.53、42697.72）），
 row.names = c（NA，6L），类= data.frame）

解决方案

您的数据采用宽格式，因此最好将其转换为长格式，以与 ggplot 一起使用。在这里，我使用 tidyr :: gather（）来做到这一点

 库（tidyr）
库（ggplot2）
 
 df_long<-df％&％;％
收集（年份，销售额，-区域）
 df_long 
＃>地区年销售额
＃> 1 A 2016 8758.82 
＃> 2 B 2016 25559.89 
＃> 3 C 2016 30848.02 
＃> 4 D 2016 8696.99 
＃> 5 E 2016 3621.12 
＃> 6 F 2016 5468.76 
＃> 7 A 2015 26521.67 
＃> 2015年8月8日89544.93 
＃> 9 C 2015 92825.55 
＃> 10 D 2015 28916.40 
＃> 11 E 2015 14004.54 
＃> 12 F 2015 16618.38 
＃> 13 A 2014 NA 
＃> 14 B 2014 NA 
＃> 15 C 2014 199673.73 
＃> 16 D 2014 37108.09 
＃> 17 E 2014 16909.87 
＃> 18 F 2014 20610.58 
＃> 19 A 2013 27605.35 
＃> 20 B 2013 NA 
＃> 21 C 2013 78794.31 
＃> 22 D 2013 31824.75 
＃> 23 E 2013 17990.21 
＃> 24 F 2013 17307.11 
＃> 25总销售量35280.49 
＃> 26 B总销售额115104.82 
＃> 27 C总销售额323347.30 
＃> 28 D总销售额74721.48 
＃> 29 E总销售额34535.53 
＃> 30 F总销售额42697.72

图：指定 color = Region 和 group =地区位于 aes 内，因此 ggplot 知道如何选择颜色并绘制线条

  ggplot（df_long，aes（x = Year，y =销售，颜色=地区，组=地区））+ 
 geom_point（）+ 
 geom_line（）+ 
 scale_color_brewer（palette ='Dark2'）+ 
 theme_classic（base_size = 12）
＃>警告：已删除3个包含缺失值的行（geom_point）。 
＃>警告：已删除2个包含缺失值的行（geom_path）。

也可以使用 facet_grid（）

< pre class = lang-r prettyprint-override>

 ggplot（df_long，aes（x =年，y =销售，组=地区））+ 
 geom_point（）+ 
 geom_line（）+ 
 facet_grid（Region〜。，scales ='free_y'）+ 
 theme_bw（base_size = 12）
＃>警告：已删除3个包含缺失值的行（geom_point）。 
＃>警告：已删除2个包含缺失值的行（geom_path）。

^{由 reprex包（v0.2.1.9000）}

I have a data frame (see below) that shows sales by region by year. The final column calculates the sum of all the sales in the region over the three year period.

I am new to R and would like use ggplot to create a SINGLE scatter plot to analyze the data. The x-axis would be the three years and the y-axis would sales.

Ideally, each region would have its own line with points (other than a few NAs) in 2013, 2014, 2015, and 2016. I would then like to color each line based on its region. The sum column should not appear on the plot. Any ideas?

df <- structure(list(Region = structure(1:6, 
                                  .Label = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", 
                                             "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U"), 
                                  class = "factor"), 
               "2016" = c(8758.82, 25559.89, 30848.02, 8696.99, 3621.12, 5468.76), 
               "2015" = c(26521.67, 89544.93, 92825.55, 28916.4, 14004.54, 16618.38), 
               "2014" = c(NA, NA, 199673.73, 37108.09, 16909.87, 20610.58), 
               "2013" = c(27605.35, NA, 78794.31, 31824.75, 17990.21, 17307.11), 
               "Total Sales" = c(35280.49, 115104.82, 323347.3, 74721.48, 34535.53, 42697.72)), 
          row.names = c(NA, 6L), class = "data.frame")

解决方案

Your data is in wide format so it's better to convert it to long format to work with ggplot. Here I use tidyr::gather() to do that

library(tidyr)
library(ggplot2)

df_long <- df %>% 
  gather(Year, Sales, -Region)
df_long
#>    Region        Year     Sales
#> 1       A        2016   8758.82
#> 2       B        2016  25559.89
#> 3       C        2016  30848.02
#> 4       D        2016   8696.99
#> 5       E        2016   3621.12
#> 6       F        2016   5468.76
#> 7       A        2015  26521.67
#> 8       B        2015  89544.93
#> 9       C        2015  92825.55
#> 10      D        2015  28916.40
#> 11      E        2015  14004.54
#> 12      F        2015  16618.38
#> 13      A        2014        NA
#> 14      B        2014        NA
#> 15      C        2014 199673.73
#> 16      D        2014  37108.09
#> 17      E        2014  16909.87
#> 18      F        2014  20610.58
#> 19      A        2013  27605.35
#> 20      B        2013        NA
#> 21      C        2013  78794.31
#> 22      D        2013  31824.75
#> 23      E        2013  17990.21
#> 24      F        2013  17307.11
#> 25      A Total Sales  35280.49
#> 26      B Total Sales 115104.82
#> 27      C Total Sales 323347.30
#> 28      D Total Sales  74721.48
#> 29      E Total Sales  34535.53
#> 30      F Total Sales  42697.72

Plot: specify color = Region and group = Region inside aes so ggplot knows how to pick color and draw lines

ggplot(df_long, aes(x = Year, y = Sales, color = Region, group = Region)) +
  geom_point() +
  geom_line() +
  scale_color_brewer(palette = 'Dark2') +
  theme_classic(base_size = 12)
#> Warning: Removed 3 rows containing missing values (geom_point).
#> Warning: Removed 2 rows containing missing values (geom_path).

Can also use facet_grid()

ggplot(df_long, aes(x = Year, y = Sales, group = Region)) +
  geom_point() +
  geom_line() +
  facet_grid(Region ~., scales = 'free_y') +
  theme_bw(base_size = 12)
#> Warning: Removed 3 rows containing missing values (geom_point).
#> Warning: Removed 2 rows containing missing values (geom_path).

^{Created on 2018-10-12 by the reprex package (v0.2.1.9000)}

这篇关于使用R ggplot绘制宽格式数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用R ggplot绘制宽格式数据 [英] Plotting wide format data using R ggplot

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用R ggplot绘制宽格式数据 [英] Plotting wide format data using R ggplot

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭