计算R中的欧式距离的函数 [英] Function to calculate Euclidean distance in R

查看:130
本文介绍了计算R中的欧式距离的函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从零开始在虹膜数据集上实现R中的KNN分类器,为此,我编写了一个函数来计算欧几里得距离.这是我的代码.

I am trying to implement KNN classifier in R from scratch on iris data set and as a part of this i have written a function to calculate the Euclidean distance. Here is my code.

known_data <- iris[1:15,c("Sepal.Length", "Petal.Length", "Class")]
unknown_data <- iris[16,c("Sepal.Length", "Petal.Length")]

# euclidean distance
 euclidean_dist <- function(k,unk) {
 distance <- 0
 for(i in 1:nrow(k))
 distance[i] <- sqrt((k[,1][i] - unk[,1][i])^2 + (k[,2][i] - unk[,2][i])^2)
 return(distance)
} 

euclidean_dist(known_data, unknown_data)

但是,当我调用该函数时,它会正确返回第一个值,并以NA形式保留.谁能证明我的代码在哪里出错了?预先感谢.

However, when i call the function it's returning the first value correctly and rest as NA. Could anyone show where i could have gone wrong with the code? Thanks in advance.

推荐答案

目标是计算出已知数据的第i行与单个unknown_data点之间的距离.

The aim is to calculate the distance between the ith row of known_data, and the single unknown_data point.

如何修改代码

当您计算 distance [i] 时,您尝试访问的是未知数据点的第i行,该行不会退出,因此为 NA .我相信,如果您进行以下修改,您的代码应该可以正常运行:

When you calculate distance[i], you're trying to access the ith row of the unknown data point, which doesn't exits, and is hence NA. I believe your code should run fine if you make the following edits:

known_data <- iris[1:15,c("Sepal.Length", "Petal.Length", "Class")] 
unknown_data <- iris[16,c("Sepal.Length", "Petal.Length")]

# euclidean distance
euclidean_dist <- function(k,unk) {
  # Make distance a vector [although not technically required]
  distance <- rep(0, nrow(k))

  for(i in 1:nrow(k))
    # Change unk[,1][i] to unk[1,1] and similarly for unk[,2][i]
    distance[i] <- sqrt((k[,1][i] - unk[1,1])^2 + (k[,2][i] - unk[1,2])^2)

  return(distance)
} 

euclidean_dist(known_data, unknown_data)

最后一点说明-在我使用的R版本中,已知数据集使用 Species 而不是 Class

One final note - in the version of R I'm using, the known dataset uses a Species as opposed to Class column

另一种方法

如@RomanLuštrik所建议,获得欧氏距离的整个目标可以通过一个简单的直线来实现:

As suggested by @Roman Luštrik, the entire aim of getting the Euclidean distances can be achieved with a simple one-liner:

sqrt((known_data[, 1] - unknown_data[, 1])^2 + (known_data[, 2] - unknown_data[, 2])^2)

这与您编写的函数非常相似,但是它是矢量化形式的,而不是通过循环来实现的,这通常是在R中执行操作的首选方式.

This is very similar to the function you wrote, but does it in vectorised form, rather than through a loop, which is often a preferable way of doing things in R.

这篇关于计算R中的欧式距离的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆