将决策边界拟合为R中的逻辑回归模型 [英] Fit decision boundary to logistic regression model in R

查看:157
本文介绍了将决策边界拟合为R中的逻辑回归模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我有2个变量(考试分数)和一个二进制分类,无论学生是否被录取学校与否。数据如下所示:

 
> head(exam.data)
Exam1Score Exam2Score Admitted
1 34.62366 78.02469 0
2 30.28671 43.89500 0
3 35.84741 72.90220 0
4 60.18260 86.30855 1
5 79.03274 75.34438 1
6 45.08328 56.31637 0

我可以使用ggplot绘制数据:

  exam.plot<  - ggplot(data = exam.data,aes(x = Exam1Score,y = Exam2Score,col = ifelse(Admitted == 1,'dark green','red'),size = 0.5))+ 
geom_point() +
实验室(x =考试1分,y =考试2分,标题=考试分数,颜色=考试分数)+
theme_bw()+
主题(legend.position =none)

然后成功地拟合逻辑回归模型: p>

  exam.lm < -  glm(data = exam.data,formula = Admitted〜Exam1Score + Exam2Score,family =binomial) 

所以在muc在搜索网页时,我决定手动适合决策边界(虽然尝试了一段时间,但使用stat_smooth却无法使其工作),我尝试了以下方法:

 #适合决策边界
plot_x <-c(min(exam.data $ Exam1Score)-2,max(exam.data $ Exam1Score)+2)
plot_y < - (-1 /coef(exam.lm)[3])*(coef(exam.lm)[2] * plot_x + coef(exam.lm)[1])$ ​​b $ b db.data< - data.frame(rbind(plot_x,plot_y))
colnames(db.data)< - c('x','y')

#Add决定边界图
ggplot()+ geom_line(data = db.data,aes(x = x,y = y))

成功绘制了决策边界,但我无法将其添加到现有的绘图中:

 > exam.plot + geom_line(data = db.data,aes(x = x,y = y))
错误:美学必须是长度为1或与dataProblems的长度相同:x,y

有人可以指出我做错了什么,或者我是否可以用+ stat_smooth()来做到这一点?



所有代码(ex2.R)和文件都在这里: https://github.com/StuHorsman/rscripts/tree/master/R/Coursera



谢谢!

>

Stuart

更新:我可以实现一些类似的功能:

  plot(exam.data $ Exam1Score,exam.data $ Exam2Score,type =n,xlab =Exam 1 Scores,ylab =Exam 2 Scores)
点(exam.data $ Exam1Score [exam.data $ Admitted == 1],exam.data $ Exam2Score [exam.data $ Admitted == 1],pch = 4,col =green)
points (exam.data $ Exam1Score [exam.data $ Admitted == 0],exam.data $ Exam2Score [exam.data $ Admitted == 0],pch = 4,col =red)
lines(db .DAT a,col =blue)


解决方案

问题是在 exam.plot 中,您不仅使用美学 x y ,但也是 col size (后者不必要)。这些图层需要在 ggplot()调用中定义的全部美学设置。 (我经常被这个问题困住)。



因此:

  exam.plot + geom_line(data = db.data,aes(x = x,y = y),col =black,size = 1)

确实有阴谋。不过,我建议改变 exam.plot 一点,并删除所有不适用于所有的美学(并将它们放入图层定义中):

  exam.plot<  -  ggplot(data = exam.data, aes(x = Exam1Score,y = Exam2Score))+ 
geom_point(aes(col = Admitted),size = 0.5)+
scale_color_manual(values = c('red','dark green')) +
实验室(x =考试1分,y =考试2分,标题=考试分数,颜色=考试分数)+
theme_bw()+
coord_equal()+#假设分数具有相同的比例。
主题(legend.position =none)

exam.plot + geom_line(data = db.data,aes(x = x,y = y))

其中包含示例数据

 <$ c $ (100)+ 0:1,
Exam2Score = rnorm(100)+ 0:1,
Admitted = factor(rep(0: 1,50)))

得出:



(以默认大小绘制,0.5对此很难看出例如)

I'm struggling to plot a decision boundary in R using ggplot.

I have 2 variables (exam scores) and a binary classification whether a student was admitted to school or not. The data looks like below:

> head(exam.data)
  Exam1Score Exam2Score Admitted
1   34.62366   78.02469        0
2   30.28671   43.89500        0
3   35.84741   72.90220        0
4   60.18260   86.30855        1
5   79.03274   75.34438        1
6   45.08328   56.31637        0

I can plot the data using ggplot:

exam.plot <- ggplot(data=exam.data, aes(x=Exam1Score, y=Exam2Score, col = ifelse(Admitted == 1,'dark green','red'), size=0.5))+
  geom_point()+
  labs(x="Exam 1 Scores", y="Exam 2 Scores", title="Exam Scores", colour="Exam Scores")+
  theme_bw()+
  theme(legend.position="none")

and then successfully fit the logistic regression model:

exam.lm <- glm(data=exam.data, formula=Admitted ~ Exam1Score + Exam2Score, family="binomial") 

So after much searching the web, I decided to manually fit the decision boundary (though did try for a while doing this using stat_smooth but couldn't get it to work), I tried the following:

# Fit the decision boundary
plot_x <- c(min(exam.data$Exam1Score)-2, max(exam.data$Exam1Score)+2)
plot_y <- (-1 /coef(exam.lm)[3]) * (coef(exam.lm)[2] * plot_x + coef(exam.lm)[1])
db.data <- data.frame(rbind(plot_x, plot_y))
colnames(db.data) <- c('x','y')

# Add the decision boundary plot
ggplot()+geom_line(data=db.data, aes(x=x, y=y))

which successfully plots the decision boundary, but I can't add it to my existing plot with:

> exam.plot+geom_line(data=db.data, aes(x=x, y=y))
Error: Aesthetics must either be length one, or the same length as the dataProblems:x, y

Can someone point out what I'm doing wrong or whether I can actually do this with +stat_smooth()?

All code (ex2.R) and files are here: https://github.com/StuHorsman/rscripts/tree/master/R/Coursera

Thanks!

Stuart

Update: I can achieve some similar with:

plot(exam.data$Exam1Score, exam.data$Exam2Score, type="n", xlab="Exam 1 Scores", ylab="Exam 2 Scores")      
points(exam.data$Exam1Score[exam.data$Admitted==1], exam.data$Exam2Score[exam.data$Admitted==1], pch=4, col="green")  
points(exam.data$Exam1Score[exam.data$Admitted==0], exam.data$Exam2Score[exam.data$Admitted==0], pch=4, col="red")        
lines(db.data, col="blue")

解决方案

The problem is that in exam.plot you use not only aesthetics x and y, but also col and size (the latter unnecesarily). The layers need to have all aesthetics set that are defined in the ggplot () call. (I've been caught often by that problem).

Thus:

exam.plot+geom_line(data=db.data, aes(x=x, y=y), col = "black", size = 1)

does plot.

However, I'd recommend changing exam.plot a bit and removing all aesthetics that do not apply for all layers (and put them into the layer definition instead):

exam.plot <- ggplot(data=exam.data, aes(x = Exam1Score, y=Exam2Score))+
  geom_point(aes (col = Admitted), size = 0.5)+
  scale_color_manual (values =  c('red', 'dark green')) + 
  labs(x="Exam 1 Scores", y="Exam 2 Scores", title="Exam Scores", colour="Exam Scores")+
  theme_bw()+
  coord_equal () +  # assuming that the scores have the same scale.
  theme(legend.position="none")

exam.plot + geom_line(data=db.data, aes(x=x, y=y))

Which with example data

exam.data <- data.frame (Exam1Score = rnorm (100) + 0:1, 
                         Exam2Score = rnorm (100) + 0:1, 
                         Admitted = factor (rep (0:1, 50)))

yields:

(plotted with default size, 0.5 would hardly be visible for this example)

这篇关于将决策边界拟合为R中的逻辑回归模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆