计算R中的欧式距离的函数 [英] Function to calculate Euclidean distance in R
问题描述
我试图从零开始在虹膜数据集上实现R中的KNN分类器,为此,我编写了一个函数来计算欧几里得距离.这是我的代码.
I am trying to implement KNN classifier in R from scratch on iris data set and as a part of this i have written a function to calculate the Euclidean distance. Here is my code.
known_data <- iris[1:15,c("Sepal.Length", "Petal.Length", "Class")]
unknown_data <- iris[16,c("Sepal.Length", "Petal.Length")]
# euclidean distance
euclidean_dist <- function(k,unk) {
distance <- 0
for(i in 1:nrow(k))
distance[i] <- sqrt((k[,1][i] - unk[,1][i])^2 + (k[,2][i] - unk[,2][i])^2)
return(distance)
}
euclidean_dist(known_data, unknown_data)
但是,当我调用该函数时,它会正确返回第一个值,并以NA形式保留.谁能证明我的代码在哪里出错了?预先感谢.
However, when i call the function it's returning the first value correctly and rest as NA. Could anyone show where i could have gone wrong with the code? Thanks in advance.
推荐答案
目标是计算出已知数据的第i行与单个unknown_data点之间的距离.
The aim is to calculate the distance between the ith row of known_data, and the single unknown_data point.
如何修改代码
当您计算 distance [i]
时,您尝试访问的是未知数据点的第i行,该行不会退出,因此为 NA
.我相信,如果您进行以下修改,您的代码应该可以正常运行:
When you calculate distance[i]
, you're trying to access the ith row of the unknown data point, which doesn't exits, and is hence NA
. I believe your code should run fine if you make the following edits:
known_data <- iris[1:15,c("Sepal.Length", "Petal.Length", "Class")]
unknown_data <- iris[16,c("Sepal.Length", "Petal.Length")]
# euclidean distance
euclidean_dist <- function(k,unk) {
# Make distance a vector [although not technically required]
distance <- rep(0, nrow(k))
for(i in 1:nrow(k))
# Change unk[,1][i] to unk[1,1] and similarly for unk[,2][i]
distance[i] <- sqrt((k[,1][i] - unk[1,1])^2 + (k[,2][i] - unk[1,2])^2)
return(distance)
}
euclidean_dist(known_data, unknown_data)
最后一点说明-在我使用的R版本中,已知数据集使用 Species
而不是 Class
列
One final note - in the version of R I'm using, the known dataset uses a Species
as opposed to Class
column
另一种方法
如@RomanLuštrik所建议,获得欧氏距离的整个目标可以通过一个简单的直线来实现:
As suggested by @Roman Luštrik, the entire aim of getting the Euclidean distances can be achieved with a simple one-liner:
sqrt((known_data[, 1] - unknown_data[, 1])^2 + (known_data[, 2] - unknown_data[, 2])^2)
这与您编写的函数非常相似,但是它是矢量化形式的,而不是通过循环来实现的,这通常是在R中执行操作的首选方式.
This is very similar to the function you wrote, but does it in vectorised form, rather than through a loop, which is often a preferable way of doing things in R.
这篇关于计算R中的欧式距离的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!