在R ggplot中找到geom_smooth曲线的所有局部最大值? [英] Find all local maxima of a geom_smooth curve in R ggplot?

查看:110
本文介绍了在R ggplot中找到geom_smooth曲线的所有局部最大值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要找到R中 geom_smooth()曲线的所有局部最大值.在之前的Stack Overflow中已要求这样做:

要找到一个最大值,我们使用 geom_smooth()底层的函数以获取曲线的y值.对于1000个以上的数据点,这可能是 gam();对于少于1000个的数据点,它可能是 loess().在这种情况下,它是 gam()来自 library(mgcv).要找到最大值,只需使用 which.max()进行子集设置即可.我们可以在 geom_smooth()上绘制建模的y值,以确认它们是相同的,我们的最大值由垂直线表示:

 库(mgcv)df<-df%&%;%mutate(smooth_y =预测(gam(y〜s(x,bs ="cs"),data = df)))最大<-df $ x [which.max(df $ smooth_y)]df%>%ggplot()+geom_point(aes(x = x,y = y))+geom_smooth(aes(x = x,y = y))+geom_line(aes(x = x,y = smooth_y),size = 1.5,linetype = 2,col ="red")+geom_vline(xintercept =最大值,颜色=绿色") 

到目前为止,太好了.但是,这里不止一个最大值.也许我们正试图找到正弦波的周期,以最大点之间的平均距离来衡量.我们如何确保找到该系列中的所有最大值?

我将答案发布在下面,但我想知道是否有比我使用的蛮力方法更优雅的解决方案.

解决方案

您可以使用游程长度编码找到后续点之间的差异翻转符号的点.请注意,此方法是近似的,并依赖于x的排序.您可以通过预测间距更近的x值来优化位置.

 库(tidyverse)库(mgcv)set.seed(404)df<-data.frame(x = seq(0,4 * pi,length.out = 1000),y = sin(seq(0,4 * pi,length.out = 1000))+ rnorm(100,0,1))df<-df%&%;%mutate(smooth_y =预测(gam(y〜s(x,bs ="cs"),data = df)))#游程长度编码差异的符号rle<-rle(diff(as.vector(df $ smooth_y))> 0)#计算运行起点开始<-cumsum(rle $ lengths)-rle $ lengths + 1#取rle为FALSE的点(所以差从正变到负)maxima_id<-开始[!rle $ values]#也很方便,但不是问题所在:#minima_id<-开始[rle $ values]最高<-df $ x [maxima_id]df%>%ggplot()+geom_point(aes(x = x,y = y))+geom_smooth(aes(x = x,y = y))+geom_line(aes(x = x,y = smooth_y),size = 1.5,linetype = 2,col ="red")+geom_vline(xintercept =最大值,颜色=绿色")#>使用方法='gam'和公式'y〜s(x,bs ="cs")'的`geom_smooth()` 

reprex软件包(v0.3.0)创建于2020-12-24 sup>

I need to find all local maxima of a geom_smooth() curve in R. This has been asked in Stack Overflow before:

How can I get the peak and valleys of a geom_smooth line in ggplot2?

But the answer related to finding a single maximum. What if there are multiple local maxima we want to find?

Here's some sample data:

library(tidyverse)

set.seed(404)
df <- data.frame(x = seq(0,4*pi,length.out=1000),
                 y = sin(seq(0,4*pi,length.out=1000))+rnorm(100,0,1))

df %>% ggplot(aes(x=x,y=y)) +
  geom_point() +
  geom_smooth()

To find a single maximum, we use the function underlying geom_smooth() in order to get the y values of the curve. This would be either gam() for 1000+ data points or loess() for fewer than 1000. In this case, it's gam() from library(mgcv). To find our maximum is a simple matter of subsetting with which.max(). We can plot the modeled y values over geom_smooth() to confirm they're the same, with our maximum represented by a vertical line:

library(mgcv)

df <- df %>% 
  mutate(smooth_y = predict(gam(y ~ s(x,bs="cs"),data=df)))

maximum <- df$x[which.max(df$smooth_y)]

df %>% ggplot() +
  geom_point(aes(x=x,y=y)) +
  geom_smooth(aes(x=x,y=y)) +
  geom_line(aes(x=x,y=smooth_y),size = 1.5, linetype = 2, col = "red")  +
  geom_vline(xintercept = maximum,color="green")

So far, so good. But, there is more than one maximum here. Maybe we're trying to find the periodicity of the sine wave, measured as the average distance between maxima. How do we make sure we find all maxima in the series?

I am posting my answer below, but I am wondering if there's a more elegant solution than the brute-force method I used.

解决方案

You can find the points where the difference between subsequent points flips sign using run-length encoding. Note that this method is approximate and relies on x being ordered. You can refine the locations by predicting more closely spaced x-values.

library(tidyverse)
library(mgcv)

set.seed(404)
df <- data.frame(x = seq(0,4*pi,length.out=1000),
                 y = sin(seq(0,4*pi,length.out=1000))+rnorm(100,0,1))

df <- df %>% 
  mutate(smooth_y = predict(gam(y ~ s(x,bs="cs"),data=df)))

# Run length encode the sign of difference
rle <- rle(diff(as.vector(df$smooth_y)) > 0)
# Calculate startpoints of runs
starts <- cumsum(rle$lengths) - rle$lengths + 1
# Take the points where the rle is FALSE (so difference goes from positive to negative) 
maxima_id <- starts[!rle$values]

# Also convenient, but not in the question:
# minima_id <- starts[rle$values]


maximum <- df$x[maxima_id]

df %>% ggplot() +
  geom_point(aes(x=x,y=y)) +
  geom_smooth(aes(x=x,y=y)) +
  geom_line(aes(x=x,y=smooth_y),size = 1.5, linetype = 2, col = "red")  +
  geom_vline(xintercept = maximum,color="green")
#> `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Created on 2020-12-24 by the reprex package (v0.3.0)

这篇关于在R ggplot中找到geom_smooth曲线的所有局部最大值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆