如何改善对数尺度和离散值的ggplot直方图的方面 [英] How to improve the aspect of ggplot histograms with log scales and discrete values

查看:221
本文介绍了如何改善对数尺度和离散值的ggplot直方图的方面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图改进我需要用对数刻度表示的离散值直方图的清晰度和方面。



请考虑以下MWE

  set.seed(99) (数据,aes(x = 0))数据< -  data.frame(dist = as.integer(rlnorm(1000,sdlog = 2)))
class(data $ dist)
ggplot dist))+ geom_histogram()

产生



然后

  ggplot(data,aes(x = dist))+ geom_line()+ scale_x_log10(breaks = c(1,2, )

可能更糟糕的是



既然现在它给人的感觉是1和2之间缺少某些东西,而且还不完全清楚哪个栏的值为1(栏右侧是 ) ),并且哪个栏的值为2(bar位于tick的左边)。

据我所知,tech从根本上讲,ggplot为日志规模提供了正确的视觉答案。然而作为观察者,我在理解它时遇到了一些问题。



是否可以改进某些内容?

编辑:



当我将Jaap解决方案应用于我的真实数据时,会发生什么?


在x = 0和x = 1之间的下陷以及在x = 1和x = 2之间来自哪里?我的值是离散的,但为什么该图也映射x = 1.5和x = 2.5?

第一件事是想起来,正在玩 binwidth 。但这并不能给出一个很好的解决方案:
$ b $ pre $ g $ pggplot(data,aes(x = dist))+
geom_histogram(binwidth = 10)+
scale_x_continuous(expand = c(0,0))+
scale_y_continuous(expand = c(0.015,0))+
theme_bw()

给出:
scale_x_log10 时,您将收到一条警告消息(已删除包含非有限值(stat_density)的524行 )。这可以通过使用 日志加一个 转换来解决。

下面的代码:

  library(ggplot2)
library(比例)

ggplot(data,aes(x = dist)) +
stat_density(aes(y = .. count ..),color =black,fill =blue,alpha = 0.3)+
scale_x_continuous(breaks = c(0,1,2 ,3,4,5,10,30,100,300,1000),trans =log1p,expand = c(0,0))+
scale_y_continuous(breaks = c(0,125,250,375,500,625,750),expand = c(0,0 ))+
theme_bw()

会得到如下结果:


I am trying to improve the clarity and aspect of a histogram of discrete values which I need to represent with a log scale.

Please consider the following MWE

set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram()

which produces

and then

ggplot(data, aes(x=dist)) + geom_line() + scale_x_log10(breaks=c(1,2,3,4,5,10,100))

which probably is even worse

since now it gives the impression that the something is missing between "1" and "2", and also is not totally clear which bar has value "1" (bar is on the right of the tick) and which bar has value "2" (bar is on the left of the tick).

I understand that technically ggplot provides the "right" visual answer for a log scale. Yet as observer I have some problem in understanding it.

Is it possible to improve something?

EDIT:

This what happen when I applied Jaap solution to my real data

Where do the dips between x=0 and x=1 and between x=1 and x=2 come from? My value are discrete, but then why the plot is also mapping x=1.5 and x=2.5?

解决方案

The first thing that comes to mind, is playing with the binwidth. But that doesn't give a great solution either:

ggplot(data, aes(x=dist)) +
  geom_histogram(binwidth=10) +
  scale_x_continuous(expand=c(0,0)) +
  scale_y_continuous(expand=c(0.015,0)) +
  theme_bw()

gives:


In this case it is probably better to use a density plot. However, when you use scale_x_log10 you will get a warning message (Removed 524 rows containing non-finite values (stat_density)). This can be resolved by using a log plus one transformation.

The following code:

library(ggplot2)
library(scales)

ggplot(data, aes(x=dist)) +
  stat_density(aes(y=..count..), color="black", fill="blue", alpha=0.3) +
  scale_x_continuous(breaks=c(0,1,2,3,4,5,10,30,100,300,1000), trans="log1p", expand=c(0,0)) +
  scale_y_continuous(breaks=c(0,125,250,375,500,625,750), expand=c(0,0)) +
  theme_bw()

will give this result:

这篇关于如何改善对数尺度和离散值的ggplot直方图的方面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆