在R中绘制决策边界 [英] Drawing decision boundaries in R
问题描述
我从knn
函数中获得了一系列建模的类标签.我有一个包含基本数值训练数据的数据框,还有一个用于测试数据的数据框.我该如何为knn
函数的返回值绘制决策边界?我必须在锁定的计算机上复制我的发现,因此,请尽可能限制使用第三方库.
I've got a series of modelled class labels from the knn
function. I've got a data frame with basic numeric training data, and another data frame for test data. How would I go about drawing a decision boundary for the returned values from the knn
function? I'll have to replicate my findings on a locked-down machine, so please limit the use of 3rd party libraries if possible.
我只有两个类标签,橙色"和蓝色".它们被绘制在带有训练数据的简单2D图上.再次,我只想围绕knn
函数的结果绘制边界.
I only have two class labels, "orange" and "blue". They're plotted on a simple 2D plot with the training data. Again, I just want to draw a boundary around the results from the knn
function.
代码:
library(class)
n <- 100
set.seed(1)
x <- round(runif(n, 1, n))
set.seed(2)
y <- round(runif(n, 1, n))
train.df <- data.frame(x, y)
set.seed(1)
x.test <- round(runif(n, 1, n))
set.seed(2)
y.test <- round(runif(n, 1, n))
test.df <- data.frame(x.test, y.test)
k <- knn(train.df, test.df, classes, k=25)
plot(test.df, col=k)
classes
只是类标签的向量,该类标签是从代码的较早部分确定的.
classes
is just a vector of class labels determined from an earlier bit of code.
如果需要,下面是我的工作的完整代码:
If you need it, below is the complete code for my work:
library(class)
n <- 100
set.seed(1)
x <- round(runif(n, 1, n))
set.seed(2)
y <- round(runif(n, 1, n))
# ============================================================
# Bayes Classifier + Decision Boundary Code
# ============================================================
classes <- "null"
colours <- "null"
for (i in 1:n)
{
# P(C = j | X = x, Y = y) = prob
# "The probability that the class (C) is orange (j) when X is some x, and Y is some y"
# Two predictors that influence classification: x, y
# If x and y are both under 50, there is a 90% chance of being orange (grouping)
# If x and y and both over 50, or if one of them is over 50, grouping is blue
# Algorithm favours whichever grouping has a higher chance of success, then plots using that colour
# When prob (from above) is 50%, the boundary is drawn
percentChance <- 0
if (x[i] < 50 && y[i] < 50)
{
# 95% chance of orange and 5% chance of blue
# Bayes Decision Boundary therefore assigns to orange when x < 50 and y < 50
# "colours" is the Decision Boundary grouping, not the plotted grouping
percentChance <- 95
colours[i] <- "orange"
}
else
{
percentChance <- 10
colours[i] <- "blue"
}
if (round(runif(1, 1, 100)) > percentChance)
{
classes[i] <- "blue"
}
else
{
classes[i] <- "orange"
}
}
boundary.x <- seq(0, 100, by=1)
boundary.y <- 0
for (i in 1:101)
{
if (i > 49)
{
boundary.y[i] <- -10 # just for the sake of visual consistency, real value is 0
}
else
{
boundary.y[i] <- 50
}
}
df <- data.frame(boundary.x, boundary.y)
plot(x, y, col=classes)
lines(df, type="l", lty=2, lwd=2, col="red")
# ============================================================
# K-Nearest neighbour code
# ============================================================
#library(class)
#n <- 100
#set.seed(1)
#x <- round(runif(n, 1, n))
#set.seed(2)
#y <- round(runif(n, 1, n))
train.df <- data.frame(x, y)
set.seed(1)
x.test <- round(runif(n, 1, n))
set.seed(2)
y.test <- round(runif(n, 1, n))
test.df <- data.frame(x.test, y.test)
k <- knn(train.df, test.df, classes, k=25)
plot(test.df, col=k)
推荐答案
在网格上获取类概率预测,并在P = 0.5(或您想要的截止点为任意值)处绘制轮廓线.这也是Venables和Ripley在经典的MASS教科书中以及Hastie,Tibshirani和Friedman在统计学习要素中使用的方法.
Get the class probability predictions on a grid, and draw a contour line at P=0.5 (or whatever you want the cutoff point to be). This is also the method used in the classic MASS textbook by Venables and Ripley, and in Elements of Statistical Learning by Hastie, Tibshirani and Friedman.
# class labels: simple distance from origin
classes <- ifelse(x^2 + y^2 > 60^2, "blue", "orange")
classes.test <- ifelse(x.test^2 + y.test^2 > 60^2, "blue", "orange")
grid <- expand.grid(x=1:100, y=1:100)
classes.grid <- knn(train.df, grid, classes, k=25, prob=TRUE) # note last argument
prob.grid <- attr(classes.grid, "prob")
prob.grid <- ifelse(classes.grid == "blue", prob.grid, 1 - prob.grid)
# plot the boundary
contour(x=1:100, y=1:100, z=matrix(prob.grid, nrow=100), levels=0.5,
col="grey", drawlabels=FALSE, lwd=2)
# add points from test dataset
points(test.df, col=classes.test)
另请参见 查看全文