在R中绘制决策边界 [英] Drawing decision boundaries in R

查看:673
本文介绍了在R中绘制决策边界的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从knn函数中获得了一系列建模的类标签.我有一个包含基本数值训练数据的数据框,还有一个用于测试数据的数据框.我该如何为knn函数的返回值绘制决策边界?我必须在锁定的计算机上复制我的发现,因此,请尽可能限制使用第三方库.

I've got a series of modelled class labels from the knn function. I've got a data frame with basic numeric training data, and another data frame for test data. How would I go about drawing a decision boundary for the returned values from the knn function? I'll have to replicate my findings on a locked-down machine, so please limit the use of 3rd party libraries if possible.

我只有两个类标签,橙色"和蓝色".它们被绘制在带有训练数据的简单2D图上.再次,我只想围绕knn函数的结果绘制边界.

I only have two class labels, "orange" and "blue". They're plotted on a simple 2D plot with the training data. Again, I just want to draw a boundary around the results from the knn function.

代码:

library(class)

n <- 100

set.seed(1)
x <- round(runif(n, 1, n))
set.seed(2)
y <- round(runif(n, 1, n))
train.df <- data.frame(x, y)

set.seed(1)
x.test <- round(runif(n, 1, n))
set.seed(2)
y.test <- round(runif(n, 1, n))
test.df <- data.frame(x.test, y.test)

k <- knn(train.df, test.df, classes, k=25)

plot(test.df, col=k)

classes只是类标签的向量,该类标签是从代码的较早部分确定的.

classes is just a vector of class labels determined from an earlier bit of code.

如果需要,下面是我的工作的完整代码:

If you need it, below is the complete code for my work:

library(class)

n <- 100
set.seed(1)
x <- round(runif(n, 1, n))
set.seed(2)
y <- round(runif(n, 1, n))

# ============================================================
# Bayes Classifier + Decision Boundary Code
# ============================================================

classes <- "null"
colours <- "null"

for (i in 1:n)
{

    # P(C = j | X = x, Y = y) = prob
    # "The probability that the class (C) is orange (j) when X is some x, and Y is some y"
    # Two predictors that influence classification: x, y
    # If x and y are both under 50, there is a 90% chance of being orange (grouping)
    # If x and y and both over 50, or if one of them is over 50, grouping is blue
    # Algorithm favours whichever grouping has a higher chance of success, then plots using that colour
    # When prob (from above) is 50%, the boundary is drawn

    percentChance <- 0
    if (x[i] < 50 && y[i] < 50)
    {
        # 95% chance of orange and 5% chance of blue
        # Bayes Decision Boundary therefore assigns to orange when x < 50 and y < 50
        # "colours" is the Decision Boundary grouping, not the plotted grouping
        percentChance <- 95
        colours[i] <- "orange"
    }
    else
    {
        percentChance <- 10
        colours[i] <- "blue"
    }

    if (round(runif(1, 1, 100)) > percentChance)
    {
        classes[i] <- "blue"
    }
    else
    {
        classes[i] <- "orange"
    }
}

boundary.x <- seq(0, 100, by=1)
boundary.y <- 0
for (i in 1:101)
{
    if (i > 49)
    {
        boundary.y[i] <- -10 # just for the sake of visual consistency, real value is 0
    }
    else
    {
        boundary.y[i] <- 50
    }
}
df <- data.frame(boundary.x, boundary.y)

plot(x, y, col=classes)
lines(df, type="l", lty=2, lwd=2, col="red")

# ============================================================
# K-Nearest neighbour code
# ============================================================

#library(class)

#n <- 100

#set.seed(1)
#x <- round(runif(n, 1, n))
#set.seed(2)
#y <- round(runif(n, 1, n))
train.df <- data.frame(x, y)

set.seed(1)
x.test <- round(runif(n, 1, n))
set.seed(2)
y.test <- round(runif(n, 1, n))
test.df <- data.frame(x.test, y.test)

k <- knn(train.df, test.df, classes, k=25)

plot(test.df, col=k)

推荐答案

在网格上获取类概率预测,并在P = 0.5(或您想要的截止点为任意值)处绘制轮廓线.这也是Venables和Ripley在经典的MASS教科书中以及Hastie,Tibshirani和Friedman在统计学习要素中使用的方法.

Get the class probability predictions on a grid, and draw a contour line at P=0.5 (or whatever you want the cutoff point to be). This is also the method used in the classic MASS textbook by Venables and Ripley, and in Elements of Statistical Learning by Hastie, Tibshirani and Friedman.

# class labels: simple distance from origin
classes <- ifelse(x^2 + y^2 > 60^2, "blue", "orange")
classes.test <- ifelse(x.test^2 + y.test^2 > 60^2, "blue", "orange")

grid <- expand.grid(x=1:100, y=1:100)
classes.grid <- knn(train.df, grid, classes, k=25, prob=TRUE)  # note last argument
prob.grid <- attr(classes.grid, "prob")
prob.grid <- ifelse(classes.grid == "blue", prob.grid, 1 - prob.grid)

# plot the boundary
contour(x=1:100, y=1:100, z=matrix(prob.grid, nrow=100), levels=0.5,
        col="grey", drawlabels=FALSE, lwd=2)
# add points from test dataset
points(test.df, col=classes.test)

另请参见 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆