散点图内核平滑:ksmooth()根本无法平滑我的数据 [英] Scatter plot kernel smoothing: ksmooth() does not smooth my data at all
问题描述
我想平滑我的解释变量(例如车辆的速度数据),然后使用此平滑后的值。我进行了很多搜索,没有找到直接答案的东西。
I want to smooth my explanatory variable, something like Speed data of a vehicle, and then use this smoothed values. I searched a lot, and find nothing that directly is my answer.
我知道如何计算内核密度估算值( density()
或 KernSmooth :: bkde()
),但我不知道该如何计算速度的平滑值。
I know how to calculate the kernel density estimation (density()
or KernSmooth::bkde()
) but I don't know then how to calculate the smoothed values of speed.
感谢@ZheyuanLi,我可以更好地解释我的拥有和想要做的事情。所以我重新编辑了我的问题,如下所示。
Thanks to @ZheyuanLi, I am able to better explain what I have and what I want to do. So I have re-edited my question as below.
我在一段时间内对车辆进行了一些速度测量,并存储为数据帧车辆
:
I have some speed measurement of a vehicle during a time, stored as a data frame vehicle
:
t speed
1 0 0.0000000
2 1 0.0000000
3 2 0.0000000
4 3 0.0000000
5 4 0.0000000
. . .
. . .
1031 1030 4.8772222
1032 1031 4.4525000
1033 1032 3.2261111
1034 1033 1.8011111
1035 1034 0.2997222
1036 1035 0.2997222
这是一个散点图:
我想对 t
平滑速度
,并且我想为此使用内核平滑。根据@Zheyuan的建议,我应该使用 ksmooth()
:
I want to smooth speed
against t
, and I want to use kernel smoothing for this purpose. According to @Zheyuan's advice, I should use ksmooth()
:
fit <- ksmooth(vehicle$t, vehicle$speed)
但是,我发现平滑值与原始值完全相同数据:
However, I found that the smoothed values are exactly the same as my original data:
sum(abs(fit$y - vehicle$speed)) # 0
为什么会这样?谢谢!
推荐答案
回答老问题
您需要区分内核密度估计和内核平滑。
密度估计,仅适用于单个变量。它旨在估计该变量在其物理域上的分布程度。例如,如果我们有1000个正常样本:
Density estimation, only works with a single variable. It aims to estimate how spread out this variable is on its physical domain. For example, if we have 1000 normal samples:
x <- rnorm(1000, 0, 1)
我们可以通过核密度估计器评估其分布:
We can assess its distribution by kernel density estimator:
k <- density(x)
plot(k); rug(x)
x轴上的地毯显示 x
值的位置,而曲线则测量这些地毯的密度。
The rugs on the x-axis shows the locations of your x
values, while the curve measures the density of those rugs.
内核更平滑,实际上是回归问题或散点图平滑问题。您需要两个变量:一个响应变量 y
和一个解释性变量 x
。我们只使用上面的 x
作为解释变量。对于响应变量 y
,我们从
Kernel smoother, is actually a regression problem, or scatter plot smoothing problem. You need two variables: one response variable y
, and an explanatory variable x
. Let's just use the x
we have above for the explanatory variable. For response variable y
, we generate some toy values from
y <- sin(x) + rnorm(1000, 0, 0.2)
给出 y之间的散点图
和 x
:
我们想找到一个平滑函数来近似那些分散的点。
we want to find a smooth function to approximate those scattered dots.
Nadaraya-Watson核回归估计,R函数为 ksmooth()
会帮助您:
The Nadaraya-Watson kernel regression estimate, with R function ksmooth()
will help you:
s <- ksmooth(x, y, kernel = "normal")
plot(x,y, main = "kernel smoother")
lines(s, lwd = 2, col = 2)
如果要根据预测来解释所有内容,则:
If you want to interpret everything in terms of prediction:
- 内核密度估计:给定
x
,预测密度为x
;也就是说,我们对概率P(grid [n]< x< grid [n + 1])
进行估算,其中grid
是一些重点; - 内核平滑:给定
x
,预测y
;也就是说,我们对函数f(x)
进行了估算,其近似值为y
。
- kernel density estimation: given
x
, predict density ofx
; that is, we have an estimate of the probabilityP(grid[n] < x < grid[n+1])
, wheregrid
is some gird points; - kernel smoothing: given
x
, predicty
; that is, we have an estimate of the functionf(x)
, which approximatesy
.
在两种情况下,都没有解释变量 x
的平滑值。因此,您的问题是:我想平滑我的解释变量
In both cases, you have no smoothed value of explanatory variable x
. So your question: "I want to smooth my explanatory variable" makes no sense.
您实际上有时间序列吗?
; 车辆的速度听起来好像您正在沿着 t
监视速度
。如果是这样,得到一个在 speed
和 t
之间的散点图,并使用 ksmooth()
。
"Speed of a vehicle" sounds like you are monitoring the speed
along time t
. If so, get a scatter plot between speed
and t
, and use ksmooth()
.
其他平滑方法,例如 loess()
和 smooth.spline()
不是内核平滑类,但是您可以进行比较。
Other smoothing approach like loess()
and smooth.spline()
are not of kernel smoothing class, but you can compare.
这篇关于散点图内核平滑:ksmooth()根本无法平滑我的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!