ggplot:为什么必须将数据转换为长格式? [英] ggplot: Why do I have to transform the data into the long format?

查看:51
本文介绍了ggplot:为什么必须将数据转换为长格式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,使用ggplot进行绘图时,通常需要将数据转换为长格式,例如下面的代码.对我来说有两个问题:

When plotting with ggplot, I often have to transforme the data into the long format, for example, like in the code below. Two questions arise for me:

  1. 有没有一种方法可以将列(因此每个变量)用作组"?因此,每列都绘制有不同的颜色吗?因此,没有必要将数据转换为长格式.(无需将每个变量放入 geom_line())
  2. 为什么必须将数据转换为长格式?背后的原因是什么?当数据具有较宽的格式时,它比绘图更好吗?

示例代码:

library(tidyverse) 
# Data in wide format
  df_wide <- data.frame(
   Horizons = seq(1,10,1),
   Country1 = c(2.5, 2.3, 2.2, 2.2, 2.1, 2.0, 1.7, 1.8, 1.7, 1.6),
   Country2 = c(3.5, 3.3, 3.2, 3.2, 3.1, 3.0, 3.7, 3.8, 3.7, 3.6),
   Country3 = c(1.5, 1.3, 1.2, 1.2, 1.1, 1.0, 0.7, 0.8, 0.7, 0.6)
   )

# Convert to long format
  df_long <- df_wide %>%
   gather(key = "variable", value = "value", -Horizons)
    
# Plot the lines
  plotstov <- ggplot(df_long, aes(x = Horizons, y = value)) + 
   geom_line(aes(colour = variable, group = variable))+
   theme_bw() 

输出:提前非常感谢!

推荐答案

很难确定这是不可能的-例如,某人可以为 ggplot 编写一个包装程序这会自动为您提供-但没有明显的解决方案.

It's hard to be say for sure that this is impossible — for example, someone could write a wrapper package for ggplot that would do this automatically for you — but there's no obvious solution like this.

ggplot 的作者哈德利·威克汉姆(Hadley Wickham)建立了完整的"tidyverse"框架.生态系统上的整洁数据的概念,该数据基本上是长格式的数据.使用长格式数据的基本原因是同一数据可以用许多宽格式表示,但是长格式通常是唯一的.例如,假设您有按年,国家和工业部门表示收入的数据.列是否以广泛的格式表示年份,国家/地区,部门或某种组合?在tidyverse/ggplot世界中,您可以简单地指定要将哪个变量用作分组变量.使用面向宽格式的工具(例如,R的 matplot ),您将首先重塑数据的形状,以使列表示分组变量(例如,年),然后对其进行绘制.

Hadley Wickham, the author of ggplot, has built the entire "tidyverse" ecosystem on the concept of tidy data, which is essentially data in long format. The basic reason for working with long-format data is that the same data can be represented by many wide formats, but the long format is typically unique. For example, suppose you have data representing revenues by year, country, and industrial sector. In a wide format, do columns represent year, country, sector, or some combination? In the tidyverse/ggplot world, you can simply specify which variable you want to use as the grouping variable. With a wide-format-oriented tool (such as base R's matplot), you would first reshape your data so that the columns represented the grouping variable (say, years), then plot it.

Wickham和他的同事构建了诸如 gather (或tidyverse的较新版本中的 pivot_longer )之类的工具,以使向长格式的转换变得容易,以及其他各种方法用于处理长(整洁")数据的工具.

Wickham and co-workers built tools like gather (or pivot_longer in newer versions of the tidyverse) to make conversion to long format easy, and a wide variety of other tools to work with long ("tidy") data.

您可以在 ggplot 周围编写包装程序,以进行转换...

You could write wrappers around ggplot that would do the conversion ...

这篇关于ggplot:为什么必须将数据转换为长格式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆