ggplot2数据和尺度的日志转换 [英] ggplot2 log transformation for data and scales

查看:150
本文介绍了ggplot2数据和尺度的日志转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我以前提出的问题的一个后续步骤将ggplot2与用户定义的stat_function()集成,我昨天回答自己。我目前的问题是,在以下可重现的示例中,应该绘制数据值混合分布组件的行,既不是出现在预期的地方,也不是预期的形状,如下图所示(见第二幅图中y = 0处的红线)





完整可重现的示例: (b)b

$ $ $ $ $ b $ library $ $ b $ library $($)

NUM_COMPONENTS < - 2

set.seed(12345)#for reproducibility

data(diamonds,package ='ggplot2')#use内置数据
myData< - 钻石$价格

#从混合分配'数据'中提取'k'分量
mix.info< - normalmix EM(myData,k = NUM​​_COMPONENTS,
maxit = 100,epsilon = 0.01)
summary(mix.info)

numComponents< - length(mix.info $ sigma)
message(Extracted number of component distributions:,
numComponents)

calc.components< - function(x,mix,comp.number){

mix $ lambda [comp.number] *
dnorm(x,mean = mix $ mu [comp.number],sd = mix $ sigma [comp.number])
}

g < - ggplot(data.frame(x = myData))+
scale_fill_continuous(Count,low =#56B1F7,high =#132B43)+
scale_x_log10(Diamond Price [log10],
breaks = trans_breaks(log10,function(x)10 ^ x),
labels = prettyNum)+
scale_y_continuous(Count) +
geom_histogram(aes(x = myData,fill = 0.01 * ..density ..),
binwidth = 0.01)
print(g)

#we可以随机选择所需的颜色数量:
#DISTRIB_COLORS< - sample(colors(),numComponen

#或者更好的方法是使用颜色区分更多的调色板:
DISTRIB_COLORS< - brewer.pal(numComponents,Set1)

distComps < - lapply(seq(numComponents),function(i))
stat_function(fun = calc.components,
arg = list(mix = mix.info,comp.number = i),$ b $对于多边形
size = 1,
color =red))##使用alpha = .5 DISTRIB_COLORS [i]
print(g + distComps )

更新:只是简单介绍我的工作。我还额外尝试了其他几个选项,包括将图的X轴缩放比例转换为正常值,并在直方图部分请求原始数据值的对数转换,如下所示: geom_histogram(aes(x = log10(data),fill = ..count ..),binwidth = 0.01),但最终结果仍然保持不变。关于我的第一条评论,我意识到,只要我使用对..count ..对象的引用,就不需要我提到的转换。



更新2 :将由 stat_function()产生的线条颜色更改为红色,以澄清问题。


最后,我已经找到了问题,删除了我以前的答案,我在下面提供了我的最新解决方案(唯一没有解决的问题是图例组件面板 - 它出于某种原因没有出现,但是对于 EDA 来证明存在混合分布我认为它是好的足够)。完整的可重复的解决方案如下。

  library(ggplot2)
library(scales)
library(RColorBrewer)
library(mixtools)

NUM_COMPONENTS< - 2

set.seed(12345)#for reproducibility

data(diamonds,package ='ggplot2')#使用内置数据
myData< - 钻石$价格


calc.components< - 函数(x,mix,comp.number){

mix $ lambda [comp.number] *
dnorm(x,mean = mix $ mu [comp.number],sd = mix $ sigma [comp.number])
}


overlayHistDensity< - function(data,calc.comp.fun){

#extract' (数据,k = NUM​​_COMPONENTS,
maxit = 100,epsilon = 0.01)
summary(mix.info)

numComponents< - 长度(mix.info $ sigma)
消息(Extracted number of component distributions:,
numCompo nents)

DISTRIB_COLORS< -
suppressWarnings(brewer.pal(NUM_COMPONENTS,Set1))

#创建(绘图)直方图和...
g < - ggplot(as.data.frame(data),aes(x = data))+
geom_histogram(aes(y = ..density ..),
binwidth = 0.01, alpha = 0.5)+
主题(legend.position ='top',legend.direction ='horizo​​ntal')

comp.labels< - lapply(seq(numComponents),
函数(i)粘贴(Component,i))

#...元件的拟合密度
distComps < - lapply(seq(numComponents),function(i)
stat_function(fun = calc.comp.fun,
args = list(mix = mix.info,comp.number = i),
size = 2,color = DISTRIB_COLORS [i]) )

legend< - list(scale_colour_manual(name =Legend:,
values = DISTRIB_COLORS,
labels = unlist(co (logData(myData),'calc.components)))

return(g + distComps + legend)
}

overlayPlot < - overlayHistDensity ')
print(overlayPlot)

结果:




This is a follow-up to my previous question Integrating ggplot2 with user-defined stat_function(), which I've answered myself yesterday. My current problem is that, in the following reproducible example, lines, which are supposed to plot components of the data values' mixture distribution, neither appear in the expected places, nor they're of expected shape, as shown below (see the red lines at y=0 in the second figure).

Complete reproducible example:

library(ggplot2)
library(scales)
library(RColorBrewer)
library(mixtools)

NUM_COMPONENTS <- 2

set.seed(12345) # for reproducibility

data(diamonds, package='ggplot2')  # use built-in data
myData <- diamonds$price

# extract 'k' components from mixed distribution 'data'
mix.info <- normalmixEM(myData, k = NUM_COMPONENTS,
                        maxit = 100, epsilon = 0.01)
summary(mix.info)

numComponents <- length(mix.info$sigma)
message("Extracted number of component distributions: ",
        numComponents)

calc.components <- function(x, mix, comp.number) {

  mix$lambda[comp.number] *
    dnorm(x, mean = mix$mu[comp.number], sd = mix$sigma[comp.number])
}

g <- ggplot(data.frame(x = myData)) +
  scale_fill_continuous("Count", low="#56B1F7", high="#132B43") + 
  scale_x_log10("Diamond Price [log10]",
                breaks = trans_breaks("log10", function(x) 10^x),
                labels = prettyNum) +
  scale_y_continuous("Count") +
  geom_histogram(aes(x = myData, fill = 0.01 * ..density..),
                 binwidth = 0.01)
print(g)

# we could select needed number of colors randomly:
#DISTRIB_COLORS <- sample(colors(), numComponents)

# or, better, use a palette with more color differentiation:
DISTRIB_COLORS <- brewer.pal(numComponents, "Set1")

distComps <- lapply(seq(numComponents), function(i)
  stat_function(fun = calc.components,
                arg = list(mix = mix.info, comp.number = i),
                geom = "line", # use alpha=.5 for "polygon"
                size = 1,
                color = "red")) # DISTRIB_COLORS[i]
print(g + distComps)

UPDATE: Just a quick note on my efforts. I have additionally tried several other options, including converting the plot's x-axis scale to normal and requesting original data values' log transformation in the histogram part, like this: geom_histogram(aes(x = log10(data), fill = ..count..), binwidth = 0.01), but the end result still remains the same. In regard to my first comment, I realized that the transformation I have mentioned is not needed as long as I'm using reference to the ..count.. object.

UPDATE 2: Changed color of line, produced by stat_function(), to red, to clarify the problem.

解决方案

Finally, I have figured out the issues, removed my previous answer and I'm providing my latest solution below (the only thing I haven't solved is legend panel for components - it doesn't appear for some reason, but for an EDA to demonstrate the presence of mixture distribution I think that it is good enough). The complete reproducible solution follows. Thanks to everybody on SO who helped w/this directly or indirectly.

library(ggplot2)
library(scales)
library(RColorBrewer)
library(mixtools)

NUM_COMPONENTS <- 2

set.seed(12345) # for reproducibility

data(diamonds, package='ggplot2')  # use built-in data
myData <- diamonds$price


calc.components <- function(x, mix, comp.number) {

  mix$lambda[comp.number] *
    dnorm(x, mean = mix$mu[comp.number], sd = mix$sigma[comp.number])
}


overlayHistDensity <- function(data, calc.comp.fun) {

  # extract 'k' components from mixed distribution 'data'
  mix.info <- normalmixEM(data, k = NUM_COMPONENTS,
                          maxit = 100, epsilon = 0.01)
  summary(mix.info)

  numComponents <- length(mix.info$sigma)
  message("Extracted number of component distributions: ",
          numComponents)

  DISTRIB_COLORS <- 
    suppressWarnings(brewer.pal(NUM_COMPONENTS, "Set1"))

  # create (plot) histogram and ...
  g <- ggplot(as.data.frame(data), aes(x = data)) +
    geom_histogram(aes(y = ..density..),
                   binwidth = 0.01, alpha = 0.5) +
    theme(legend.position = 'top', legend.direction = 'horizontal')

  comp.labels <- lapply(seq(numComponents),
                        function (i) paste("Component", i))

  # ... fitted densities of components
  distComps <- lapply(seq(numComponents), function (i)
    stat_function(fun = calc.comp.fun,
                  args = list(mix = mix.info, comp.number = i),
                  size = 2, color = DISTRIB_COLORS[i]))

  legend <- list(scale_colour_manual(name = "Legend:",
                                     values = DISTRIB_COLORS,
                                     labels = unlist(comp.labels)))

  return (g + distComps + legend)
}

overlayPlot <- overlayHistDensity(log10(myData), 'calc.components')
print(overlayPlot)

Result:

这篇关于ggplot2数据和尺度的日志转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆