为什么coord_equal破坏了我的热度图 [英] Why does coord_equal break my heatmap

查看:42
本文介绍了为什么coord_equal破坏了我的热度图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据以下数据创建热图:

I'm trying to create a heatmap out of the following data:

> head(myData.aggregated)
             datetime value       date                time
1 2016-03-31 14:19:00     3 2016-03-31 2016-06-11 14:19:00
2 2016-03-31 14:49:00    69 2016-03-31 2016-06-11 14:49:00
3 2016-03-31 15:49:00     5 2016-03-31 2016-06-11 15:49:00
4 2016-03-31 16:19:00     7 2016-03-31 2016-06-11 16:19:00
5 2016-03-31 17:49:00     2 2016-03-31 2016-06-11 17:49:00
6 2016-03-31 18:19:00     7 2016-03-31 2016-06-11 18:19:00

> tail(myData.aggregated)
              datetime value       date                time
90 2016-04-06 13:19:00     1 2016-04-06 2016-06-11 13:19:00
91 2016-04-06 13:49:00    25 2016-04-06 2016-06-11 13:49:00
92 2016-04-06 14:19:00     7 2016-04-06 2016-06-11 14:19:00
93 2016-04-06 14:49:00     1 2016-04-06 2016-06-11 14:49:00
94 2016-04-06 22:19:00     3 2016-04-06 2016-06-11 22:19:00
95 2016-04-06 22:49:00    14 2016-04-06 2016-06-11 22:49:00

以及以下ggplot2命令.

And the following ggplot2 commands.

ggplot(myData.aggregated, aes(x = time, y = date, fill = scale(value))) + geom_tile() + coord_equal()

一旦我添加coord_equal(),结果就是一个空白图表.有人可以向我解释为什么会发生这种情况以及如何解决它.我的目标是每隔30分钟获取一张带有正方形图块的热图.

As soon as I add coord_equal() the result is a blank graph. Can someone explain to me why this is happening and how I can fix it. My goal is to get a heatmap with square tiles for each 30 min interval.

更新1:

> dput(head(myData.aggregated))
structure(list(datetime = structure(c(1459426740, 1459428540, 
1459432140, 1459433940, 1459439340, 1459441140), class = c("POSIXct", 
"POSIXt"), tzone = ""), value = c(3L, 69L, 5L, 7L, 2L, 7L), date = structure(c(16891, 
16891, 16891, 16891, 16891, 16891), class = "Date"), time = structure(c(1465647540, 
1465649340, 1465652940, 1465654740, 1465660140, 1465661940), class = c("POSIXct", 
"POSIXt"), tzone = "")), .Names = c("datetime", "value", "date", 
"time"), row.names = c(NA, 6L), class = "data.frame")

推荐答案

TL; DR::y轴跨度为六个单位,x轴跨度为数万个单位.添加 coord_equal 时,y轴将被压缩到x轴物理长度的大约1/10,000,有效地使绘图区域消失. date 列(y轴)以天为单位, time 列(x轴)以秒为单位,但是ggplot均将它们视为无单位数字.您也可以以秒为单位指定y轴,但这仍将为您提供不理想的长宽比至少为6:1的图.有关代码和其他详细信息,请参见下文.

TL;DR: The y-axis spans six units and the x-axis spans tens-of-thousands of units. When you add coord_equal, the y-axis gets squashed to roughly 1/10,000th the physical length of the x-axis, effectively making the plot area disappear. The date column (y-axis) happens to be in days and the time column (x-axis) in seconds, but both are treated as unitless numbers by ggplot. You can denominate the y-axis in seconds also, but that will still give you a plot with an undesirable aspect ratio of at least 6:1. See below for code and additional detail.

正在发生的事情: date Date 格式,因此以天为单位,范围为6天. time 采用 POSIXct 格式,以秒为单位,范围为几十(因为我们只对一天中的时间感兴趣,而不考虑日期)-几千秒(最多86,400秒或一天的长度).

Here's what's happening: date is in Date format and is therefore denominated in days, with a range of 6 days. time is in POSIXct format, which is denominated in seconds, with a range (since we're only interested in the time of day, regardless of date) of tens-of-thousands of seconds (up to a maximum of 86,400 seconds, or the length of one day).

日期日期 POSIXct 格式的基础值分别是带有 Date POSIXct 的数字值附带的课程.结果,当您添加 coord_equal 时,y轴上的一个单位所占的物理距离与x轴上的1个单位所占用的物​​理距离相同,因为ggplot(显然)会计算 coord_equal 基于值的数字幅度,而不考虑其日期时间类.但是整个y轴跨度为6个单位,而x轴跨度为数万个单位.因此,当您需要 coord_equal 时,y:x的宽高比将被压缩到大约1:10,000左右,从而使绘图在所有实际用途中都消失了.

The underlying values of Date and POSIXct formats are just numeric values with, respectively, Date and POSIXct classes attached. As a result, when you add coord_equal, one unit on the y-axis takes up the same physical distance as 1 unit on the x-axis because ggplot (apparently) calculates coord_equal based on the numeric magnitudes of the values, without regard to their date-time class. But the entire y-axis spans 6 units while the x-axis spans tens-of-thousands of units. Thus, when you require coord_equal, the y:x aspect ratio gets squashed to on the order of 1:10,000 or so, making the plot disappear for all practical purposes.

您可以以秒为单位指定x轴和y轴,但是即使这样,y轴的范围(6天)的范围也至少是x轴(最多一天)的六倍,结果为ay:使用 coord_equal 的x长宽比至少为6:1,比1:1:1更好,但仍然不是很实用.

You can denominate both the x and y axes in seconds, but even then the y-axis will span at least six times the range (6 days) as the x-axis (maximum of one day), resulting in a y:x aspect ratio of at least 6:1 with coord_equal, which is better than 1:10,000, but still not very practical.

这是一个伪造数据的例子:

Here's an example with fake data:

# Fake data
set.seed(4959)
dat = data.frame(datetime=seq(as.POSIXct("2016-03-31"), as.POSIXct("2016-04-06"), by="hour"))
dat$value = sample(1:50, nrow(dat), replace=TRUE)

ggplot(dat, 
       aes(x = as.POSIXct(as.numeric(datetime) %% 86400, 
                          tz="UTC", origin=as.Date("2016-01-01")), 
           y = as.POSIXct(as.Date(datetime)), 
           fill = scale(value))) + 
  geom_tile() + 
  labs(y="Date", x="Time") + 
  scale_x_datetime(date_labels="%H:%m") +
  coord_equal()

在上面的代码中,要创建y值,我们首先将其转换为 Date 格式,这消除了一天中的时间,然后转换回 POSIXct 来转换单位到秒,但对于给定日期的所有 datetime 值,该时间等于当天的午夜.

In the code above, to create the y values we first convert to Date format, which eliminates the time of day and then convert back to POSIXct which converts the unit to seconds, but with time equal to midnight on that day for all datetime values on a given date.

要创建x值,我们只想要一天中的时间,以午夜后的秒为单位,因此我们将除以86400(一天中的秒数)后的 datetime 数值的余数.要使小时正确,必须使用 tz = UTC ,要使函数运行,需要 origin (可以是任何日期;我们只需要一天中的时间)没有错误.

To create the x values, we just want time of day in seconds after midnight, so we calculate the remainder of the numeric value of datetime after division by 86400 (number of seconds in a day). The tz=UTC is necessary to get the hours right and origin (which can be any date; we just want the time of day) is necessary to get the function to run without an error.

下面是有和没有 coord_equal 时的情节图.请注意,使用 coord_equal 的x轴跨越一天的时间(从午夜到午夜),其长度与y轴上的一天相同.这是因为我们以秒为单位指定了y和x值.但是,只要y轴跨度数天,而x轴仅跨度数天, coord_equal 将导致不良的长宽比.

Below is what the plot looks like with and without coord_equal. Note that with coord_equal the x-axis, which spans one day of time (from midnight to midnight) has the same length as one day on the y axis. That's because we denominated both the y and x values in seconds. However, as long as the y axis spans several days and the x-axis spans only one day, coord_equal will result in an undesirable aspect ratio.

以下是如果y值以天而不是秒为单位,并且指定了 coord_equal 的情况,则y轴如何相对于x轴进行挤压:

Below is a demonstration of how the y-axis gets squashed relative to the x-axis if the y values are denominated in days rather than seconds and coord_equal is specified:

ggplot(dat, 
       aes(x = as.POSIXct(as.numeric(datetime) %% 86400, 
                          tz="UTC", origin=as.Date("2016-01-01")), 
           y = as.Date(datetime), 
           fill = scale(value))) + 
  geom_tile() + 
  labs(y="Date", x="Time") + 
  scale_x_datetime(date_labels="%H:%m") + 
  coord_equal()

这篇关于为什么coord_equal破坏了我的热度图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆