用r查找双峰分布中的局部最小值 [英] Find local minimum in bimodal distribution with r

查看：261 发布时间：2020/4/25 11:24:34 r statistics kernel distribution kernel-density

本文介绍了用r查找双峰分布中的局部最小值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的数据是经过预处理的图像数据，我想将两个类分开.在理论上(希望在实践中)，最佳阈值是双峰分布数据中两个峰之间的局部最小值.

我的测试数据是: http://www.file-upload .net/download-9365389/data.txt.html

我试图遵循

我将计算密度函数的一阶和二阶导数以找到局部极值，特别是局部最小值.但是我不知道如何在R中执行此操作，并且density(test)似乎不是正常功能.因此，请帮助我:如何计算导数并找到密度函数density(test)的两个峰之间的凹坑的局部最小值?

解决方案

有几种方法可以做到这一点.

首先，按照问题中的说明使用d作为密度，d$x和d$y包含密度图的x和y值.最小值在导数dy/dx = 0时发生.由于x值的间距相等，我们可以使用diff(d$y)估算dy，并在d$x最小化abs(diff(d$y))的情况下寻找d$x:

d$x[which.min(abs(diff(d$y)))]
# [1] 2.415785

问题在于，当dy/dx = 0时，密度曲线也可能最大化.在这种情况下，最小值最小，但最大值已达到峰值，因此可以使用，但是您不能依靠这一点.

因此，第二种方法使用optimize(...)，它在给定的时间间隔内寻找局部最小值. optimize(...)需要一个函数作为参数，因此我们使用approxfun(d$x,d$y)创建一个插值函数.

optimize(approxfun(d$x,d$y),interval=c(1,4))$minimum
# [1] 2.415791

最后，我们证明这确实是最低要求:

hist(data,prob=TRUE)
lines(d, col="red", lty=2)
v <- optimize(approxfun(d$x,d$y),interval=c(1,4))$minimum
abline(v=v, col="blue")

实际上是首选的另一种方法是使用k-均值聚类.

df <- read.csv(header=F,"data.txt")
colnames(df) = "X"

# bimodal
km <- kmeans(df,centers=2)
df$clust <- as.factor(km$cluster)
library(ggplot2)
ggplot(df, aes(x=X)) + 
  geom_histogram(aes(fill=clust,y=..count../sum(..count..)),
                     binwidth=0.5, color="grey50")+
  stat_density(geom="line", color="red")

实际上，数据看起来比双峰更具三峰性.

# trimodal
km <- kmeans(df,centers=3)
df$clust <- as.factor(km$cluster)
library(ggplot2)
ggplot(df, aes(x=X)) + 
  geom_histogram(aes(fill=clust,y=..count../sum(..count..)),
                 binwidth=0.5, color="grey50")+
  stat_density(geom="line", color="red")

My data are pre-processed image data and I want to seperate two classes. In therory (and hopefully in practice) the best threshold is the local minimum between the two peaks in the bimodal distributed data.

My testdata is: http://www.file-upload.net/download-9365389/data.txt.html

I tried to follow this thread: I plotted the histogram and calculated the kernel density function:

datafile <- read.table("....txt")
data <- data$V1
hist(data)

d <- density(data) # returns the density data with defaults
hist(data,prob=TRUE)
lines(d) # plots the results

But how to continue?

I would calculate the first and second derivates of the density function to find the local extrema, specifically the local minimum. However I have no idea how to do this in R and density(test) seems not to be a normal function. Thus please help me: how can I calculate the derivates and find the local minimum of the pit between the two peaks in the density function density(test)?

解决方案

There are a few ways to do this.

First, using d for the density as in your question, d$x and d$y contain the x and y values for the density plot. The minimum occurs when the derivative dy/dx = 0. Since the x-values are equally spaced, we can estimate dy using diff(d$y), and seek d$x where abs(diff(d$y)) is minimized:

d$x[which.min(abs(diff(d$y)))]
# [1] 2.415785

The problem is that the density curve could also be maximized when dy/dx = 0. In this case the minimum is shallow but the maxima are peaked, so it works, but you can't count on that.

So a second way uses optimize(...) which seeks a local minimum in a given interval. optimize(...) needs a function as argument, so we use approxfun(d$x,d$y) to create an interpolation function.

optimize(approxfun(d$x,d$y),interval=c(1,4))$minimum
# [1] 2.415791

Finally, we show that this is indeed the minimum:

hist(data,prob=TRUE)
lines(d, col="red", lty=2)
v <- optimize(approxfun(d$x,d$y),interval=c(1,4))$minimum
abline(v=v, col="blue")

Another approach, which is preferred actually, uses k-means clustering.

df <- read.csv(header=F,"data.txt")
colnames(df) = "X"

# bimodal
km <- kmeans(df,centers=2)
df$clust <- as.factor(km$cluster)
library(ggplot2)
ggplot(df, aes(x=X)) + 
  geom_histogram(aes(fill=clust,y=..count../sum(..count..)),
                     binwidth=0.5, color="grey50")+
  stat_density(geom="line", color="red")

The data actually looks more trimodal than bimodal.

# trimodal
km <- kmeans(df,centers=3)
df$clust <- as.factor(km$cluster)
library(ggplot2)
ggplot(df, aes(x=X)) + 
  geom_histogram(aes(fill=clust,y=..count../sum(..count..)),
                 binwidth=0.5, color="grey50")+
  stat_density(geom="line", color="red")

这篇关于用r查找双峰分布中的局部最小值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用r查找双峰分布中的局部最小值 [英] Find local minimum in bimodal distribution with r

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

用r查找双峰分布中的局部最小值 [英] Find local minimum in bimodal distribution with r

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭