关于“如何从统计学习的元素中绘制k近邻分类器的决策边界?"的变体? [英] Variation on "How to plot decision boundary of a k-nearest neighbor classifier from Elements of Statistical Learning?"

查看:244
本文介绍了关于“如何从统计学习的元素中绘制k近邻分类器的决策边界?"的变体?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是与

我一直在研究该示例,并希望尝试使其与三个类一起使用.我可以用类似的方式更改g的一些值

g[8:16] <- 2

只是假装有些样品来自第三类.不过,我无法使该情节起作用.我想我需要更改赢得类票比例的线:

prob <- attr(mod15, "prob")
prob <- ifelse(mod15=="1", prob, 1-prob)

以及轮廓上的水平:

contour(px1, px2, prob15, levels=0.5, labels="", xlab="", ylab="", main=
"15-nearest neighbour", axes=FALSE)

我也不确定轮廓是否是正确的工具.一种可行的替代方法是创建一个覆盖我感兴趣的区域的数据矩阵,对该矩阵的每个点进行分类,并用大的标记和不同的颜色绘制这些点,类似于对这些点所做的操作(gd .. .)位.

最终目的是能够显示由不同分类器生成的不同决策边界.有人可以指出我正确的方向吗?

谢谢 拉斐尔

解决方案

分离代码中的主要部分将有助于概述如何实现此目标:

3类测试数据

 train <- rbind(iris3[1:25,1:2,1],
                iris3[1:25,1:2,2],
                iris3[1:25,1:2,3])
 cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))

覆盖网格的测试数据

 require(MASS)

 test <- expand.grid(x=seq(min(train[,1]-1), max(train[,1]+1),
                           by=0.1),
                     y=seq(min(train[,2]-1), max(train[,2]+1), 
                           by=0.1))

该网格的分类

显然有3个课程

 require(class)
 classif <- knn(train, test, cl, k = 3, prob=TRUE)
 prob <- attr(classif, "prob")

用于绘制的数据结构

 require(dplyr)

 dataf <- bind_rows(mutate(test,
                           prob=prob,
                           cls="c",
                           prob_cls=ifelse(classif==cls,
                                           1, 0)),
                    mutate(test,
                           prob=prob,
                           cls="v",
                           prob_cls=ifelse(classif==cls,
                                           1, 0)),
                    mutate(test,
                           prob=prob,
                           cls="s",
                           prob_cls=ifelse(classif==cls,
                                           1, 0)))

情节

 require(ggplot2)
 ggplot(dataf) +
    geom_point(aes(x=x, y=y, col=cls),
               data = mutate(test, cls=classif),
               size=1.2) + 
    geom_contour(aes(x=x, y=y, z=prob_cls, group=cls, color=cls),
                 bins=2,
                 data=dataf) +
    geom_point(aes(x=x, y=y, col=cls),
               size=3,
               data=data.frame(x=train[,1], y=train[,2], cls=cl))

我们也可以稍微想一点点,并绘制类成员资格的概率以表示信心".

 ggplot(dataf) +
    geom_point(aes(x=x, y=y, col=cls, size=prob),
               data = mutate(test, cls=classif)) + 
    scale_size(range=c(0.8, 2)) +
    geom_contour(aes(x=x, y=y, z=prob_cls, group=cls, color=cls),
                 bins=2,
                 data=dataf) +
    geom_point(aes(x=x, y=y, col=cls),
               size=3,
               data=data.frame(x=train[,1], y=train[,2], cls=cl)) +
    geom_point(aes(x=x, y=y),
               size=3, shape=1,
               data=data.frame(x=train[,1], y=train[,2], cls=cl))

This is a question related to https://stats.stackexchange.com/questions/21572/how-to-plot-decision-boundary-of-a-k-nearest-neighbor-classifier-from-elements-o

For completeness, here's the original example from that link:

library(ElemStatLearn)
require(class)
x <- mixture.example$x
g <- mixture.example$y
xnew <- mixture.example$xnew
mod15 <- knn(x, xnew, g, k=15, prob=TRUE)
prob <- attr(mod15, "prob")
prob <- ifelse(mod15=="1", prob, 1-prob)
px1 <- mixture.example$px1
px2 <- mixture.example$px2
prob15 <- matrix(prob, length(px1), length(px2))
par(mar=rep(2,4))
contour(px1, px2, prob15, levels=0.5, labels="", xlab="", ylab="", main=
        "15-nearest neighbour", axes=FALSE)
points(x, col=ifelse(g==1, "coral", "cornflowerblue"))
gd <- expand.grid(x=px1, y=px2)
points(gd, pch=".", cex=1.2, col=ifelse(prob15>0.5, "coral", "cornflowerblue"))
box()

I've been playing with that example, and would like to try to make it work with three classes. I can change some values of g with something like

g[8:16] <- 2

just to pretend that there are some samples which are from a third class. I can't make the plot work, though. I guess I need to change the lines that deal with the proportion of votes for winning class:

prob <- attr(mod15, "prob")
prob <- ifelse(mod15=="1", prob, 1-prob)

and also the levels on the contour:

contour(px1, px2, prob15, levels=0.5, labels="", xlab="", ylab="", main=
"15-nearest neighbour", axes=FALSE)

I am also not sure contour is the right tool for this. One alternative that works is to create a matrix of data that covers the region I'm interested, classify each point of this matrix and plot those with a large marker and different colors, similar to what is being done with the points(gd...) bit.

The final purpose is to be able to show different decision boundaries generated by different classifiers. Can someone point me to the right direction?

thanks Rafael

解决方案

Separating the main parts in the code will help outlining how to achieve this:

Test data with 3 classes

 train <- rbind(iris3[1:25,1:2,1],
                iris3[1:25,1:2,2],
                iris3[1:25,1:2,3])
 cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))

Test data covering a grid

 require(MASS)

 test <- expand.grid(x=seq(min(train[,1]-1), max(train[,1]+1),
                           by=0.1),
                     y=seq(min(train[,2]-1), max(train[,2]+1), 
                           by=0.1))

Classification for that grid

3 classes obviously

 require(class)
 classif <- knn(train, test, cl, k = 3, prob=TRUE)
 prob <- attr(classif, "prob")

Data structure for plotting

 require(dplyr)

 dataf <- bind_rows(mutate(test,
                           prob=prob,
                           cls="c",
                           prob_cls=ifelse(classif==cls,
                                           1, 0)),
                    mutate(test,
                           prob=prob,
                           cls="v",
                           prob_cls=ifelse(classif==cls,
                                           1, 0)),
                    mutate(test,
                           prob=prob,
                           cls="s",
                           prob_cls=ifelse(classif==cls,
                                           1, 0)))

Plot

 require(ggplot2)
 ggplot(dataf) +
    geom_point(aes(x=x, y=y, col=cls),
               data = mutate(test, cls=classif),
               size=1.2) + 
    geom_contour(aes(x=x, y=y, z=prob_cls, group=cls, color=cls),
                 bins=2,
                 data=dataf) +
    geom_point(aes(x=x, y=y, col=cls),
               size=3,
               data=data.frame(x=train[,1], y=train[,2], cls=cl))

We can also be a little fancier and plot the probability of class membership as a indication of the "confidence".

 ggplot(dataf) +
    geom_point(aes(x=x, y=y, col=cls, size=prob),
               data = mutate(test, cls=classif)) + 
    scale_size(range=c(0.8, 2)) +
    geom_contour(aes(x=x, y=y, z=prob_cls, group=cls, color=cls),
                 bins=2,
                 data=dataf) +
    geom_point(aes(x=x, y=y, col=cls),
               size=3,
               data=data.frame(x=train[,1], y=train[,2], cls=cl)) +
    geom_point(aes(x=x, y=y),
               size=3, shape=1,
               data=data.frame(x=train[,1], y=train[,2], cls=cl))

这篇关于关于“如何从统计学习的元素中绘制k近邻分类器的决策边界?"的变体?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆