如何通过R ggplot中的多个条件/方面来计算和标记分布的峰值? [英] How to calculate and label peak value of distribution by multiple conditions/facets in R ggplot?

查看:79
本文介绍了如何通过R ggplot中的多个条件/方面来计算和标记分布的峰值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

虽然这个问题看起来与其他问题相似,但我的想法却有一个关键的区别.

While the question appears similar to others, there's a key difference in my mind.

  1. 我希望能够计算和/或打印(将其绘制为最终目标,但在数据框架中将其作为主要目标进行计算)EACH SUB密度曲线的 峰值-CONFITION BY FACET 密度图如下所示:

因此,理想情况下,我将能够知道每种条件下对应于密度曲线最高峰的强度(x轴值).

So, ideally, I would be able to know the intensity (x-axis value) corresponding to the highest peak of the density curves for each condition.

这里有一些虚拟数据:

set.seed(1234)

library(tidyverse)
library(fs)
n = 100000
silence = factor(c("sil1", "sil2", "sil3", "sil4", "sil5"))
treat = factor(c("con", "uos", "uos+wnt5a", "wnt5a"))
silence = rep(silence, n)
treat = rep(treat, n)
intensity = sample(4000:10000, n)

df <- cbind(silence, treat, intensity)
df$silence <- silence
df$treat <- treat

  1. 我尝试过的事情:

  • 对主要DF进行子集化,然后遍历并计算每种条件的密度,但这可能需要几天的时间.
  • 接近此答案的地方:
  • 再次,仅在控制台中获得这些组中每个组的峰值(即通过沉默子分布的处理)就足够了,但是将它们添加为垂直线这些图可能是最上面的甜樱桃(它也可能使它忙得不可开交,所以稍后我会看到这幅图)

    Again, it would be sufficient to get the peak values for each of these groups (i.e., treatments by silencing subdistributions) just in the console, but adding them as a vertical line in the graphs would be a sweet cherry on top (it could also make it hella busy, so I will see about that piece later)

    谢谢!

    推荐答案

    根据生成密度图的方式,在进入ggplot之前,可能会有更直接的方法来重新创建密度计算.这将是获取峰值并使其保持数据格式的最简单方法.

    Depending on the way you're producing the density plots, there may be a more direct way to recreate the density calculation before it goes into ggplot. That'll be the easiest way to get the peak values and keep them in the format of your data.

    否则,这是一种应该可以正常使用的黑客工具,但需要进行一些整理以使提取的点重新适合原始数据的形式.

    Without that, here's a hack that should work in general, but requires some kludging to fit the extracted points back into the form of your original data.

    这里有一个像你一样的情节:

    Here's a plot like yours:

    mtcars %>% 
      mutate(gear = as.character(gear)) %>%
      ggplot(aes(wt, fill = gear, group = gear)) +
      geom_density(alpha = 0.2) +
      facet_wrap(~am) ->my_plot
    

    以下是构成该图的组件:

    Here are the components that make up that plot:

    ggplot_build(my_plot) -> my_plot_innards
    

    通过一些丑陋的黑客攻击,我们可以提取出组成曲线的点,并使它们看起来像我们的原始数据.一些信息被破坏,例如齿轮值3/4/5成为组1/2/3.也许有一种很酷的转换回来的方法,但是我还不知道.

    With some ugly hacking we can extract the points that make up the curves and make them look kind of like our original data. Some info is destroyed, e.g. the gear values 3/4/5 become group 1/2/3. There might be a cool way to convert back, but I don't know it yet.

    extracted_points <- tibble(
      wt = my_plot_innards[["data"]][[1]][["x"]],
      y = my_plot_innards[["data"]][[1]][["y"]],
      gear = (my_plot_innards[["data"]][[1]][["group"]] + 2) %>% as.character, # HACK
      am = (my_plot_innards[["data"]][[1]][["PANEL"]] %>% as.numeric) - 1 # HACK
    )
    
    ggplot(extracted_points, aes(wt, y, fill = gear)) +
      geom_point(size = 0.3) +
      facet_wrap(~am)
    

    extracted_points_notes <- extracted_points %>%
      group_by(gear, am) %>%
      slice_max(y)
    
    
    my_plot +
      geom_point(data = extracted_points_notes,
                 aes(y = y), color = "red", size = 3, show.legend = FALSE) +
      geom_text(data = extracted_points_notes, hjust = -0.5,
                 aes(y = y, label = scales::comma(y)), color = "red", size = 3, show.legend = FALSE)
    

    这篇关于如何通过R ggplot中的多个条件/方面来计算和标记分布的峰值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆