Y轴在对数刻度上并在gbm.plot中居中 [英] Y axes on the logit scale and centered in gbm.plot

查看:161
本文介绍了Y轴在对数刻度上并在gbm.plot中居中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在探索dismo包中的gbm函数,以创建用于物种分布建模的增强回归树.我一直在使用dismo渐晕和Elith等人于2008年发表在《动物生态学杂志》上的论文增强回归树的工作指南".在Elith等人的第808:809页上.在本文中,作者解释了部分依赖图,并在809页的底部给出了一个示例(图6).根据Dismo小插图用于生态建模的增强回归树",gbm.plot绘制响应对一个或多个预测变量的部分依赖关系".

Gbm.plot创建的图看起来几乎与Elith等人的示例完全相同.但是,有一些参数我无法弄清楚如何设置以实现与本文完全相同的图.

  1. 纸张中的y轴在对数刻度上,并居中 数据分布的平均值为零. gbm.plot中的y轴 表示拟合函数.

  2. 本文中的地毯在情节的顶部,gbm.step地毯是 在底部.

  3. Gbm.plot使用变量名称作为x轴标签.本文具有有意义的轴标签.

以下是Elith纸上的图形与使用gbm.plot制作的图形的对比.

Elith et al.,2009年的图6

来自gbm.plot

我的解决方案

在寻找解决方案时,我遇到了解决方案

这很晚,但是我可以为问题3提供一种回旋解决方案:在gbm.plot中添加自定义x标签.我敢肯定有更好的方法,但这就是我所做的.如果您的数据集很大并且正在完善经常使用的变量,则此方法很有用.

步骤1.找到dismo软件包的gbm.plot源代码.选择所有代码并创建一个新脚本,并将函数命名为gbm.plot2.搜索"var.name".替换任何要更改var.name的实例.例子:

var.name <- gbm.call$predictor.names[k]
var.name <- x.label 

对此:

var.name <- labels[j]

现在保存脚本并使用source()调用它,或者运行整个脚本以将gbm.plot2放入全局环境.

第2步.假设我们的数据帧称为"df",具有200列.在gbm.step中选择要调用的列号.

vars <- c(17, 175, 198)

第3步.制作一个包含两列的数据框:一列将包含您可能会感兴趣的所有可能的变量名,另一列将包含您要使用的标签.确保ColumnName实际上与您在"colnames(df)[vars]"中可以找到的内容相匹配.

ColumnNames <- c("HiHorAve", "Elev", "Type5")
Labels <- c("Hi Hello Avenue", "Probably Elevation", "Type 5 of Something")
labels <- data.frame(ColumnNames,Labels)

现在,按照标签在数据框中的显示顺序对标签进行排序.如果您有很多变量,并且数据框经常更改形状,这将很有帮助.

labels <- labels[match(colnames(df)[vars], labels$ColumnNames),]

第4步.运行gbm.step方程,如下所示:

BRTmodel<- gbm.step(data=df, gbm.x=vars, gbm.y = 5, .....)

第5步.获取模型摘要-它按相对重要性对变量进行排序.然后按相对重要性排列标签.

smry1<- summary(BRTmodel)

labels <- labels[order(match(names(df)[vars],smry1$var))]
labels <- labels$Labels #extract the labels to a vector

第6步.现在运行新的gbm.plot脚本!

  gbm.plot2(BRTmodel, n.plots=3, y.label="")

它应该只绘制漂亮的标签.

I am currently exploring the gbm functions in the package dismo to create boosted regression trees for species distribution modeling. I have been using the dismo vignettes as well as the 2008 paper "A working guide to boosted regression trees" by Elith et al., published in the Journal of Animal Ecology. On page 808:809 of the Elith et al. article, the authors explain partial dependence plots and give an example at the bottom of page 809 (Fig. 6). According to the dismo vignette "Boosted Regression Trees for ecological modeling", gbm.plot "Plots the partial dependence of the response on one or more predictors".

Gbm.plot creates plots that look almost exactly like the example in Elith et al.. However, there are a few parameters I cannot figure out how to set to achieve a figure the exact same as in the paper.

  1. The y-axes in the paper are on the logit scale and are centered to have a zero mean over the data distribution. The y-axes in gbm.plot represent the fitted function.

  2. The rug in the paper is on the top of the plots, gbm.step the rug is on the bottom.

  3. Gbm.plot uses the variable name as the x-axis label. The paper has meaningful axis labels.

Here is the figure from the Elith paper compared to one produced with gbm.plot

Figure 6 from Elith et al., 2009

From gbm.plot

My solutions

While looking for solutions I came across this question and it gave me the idea to look at the source code (a first for me). From the source, I was able to get a good idea of how the function is put together, but there is still much I don't understand.

  1. I am not sure what to change to transform the y-axes to the logit scale and center them to have a mean of zero.

  2. I was able to change the source to move the rug to the top of the plots. I found the command for the rug function and added an argument of side=3.

  3. For the variable names, I figure I need to make a list of appropriate variable names, attach it to the data, and somehow read it into the source code. Still over my head.

I will be thankful for any input. I also think that if other ecologists are using the Elith paper to guide them, they may run into the same problem.

Here is an example of the code I ran to produce the plots

gbm.plot(all.sum.tc4.lr001, rug=TRUE, smooth=TRUE, n.plots=9, common.scale=TRUE, write.title = FALSE, show.contrib=TRUE, plot.layout=c(2,3), cex.lab=1.5)

解决方案

This is late, but I can provide a roundabout solution to problem 3: adding custom x-labels to gbm.plot. I'm sure there's a better way but here is what I did. This method is helpful if you have a large dataset and are refining the variables you are using a lot.

Step 1. Locate the dismo package's source code for gbm.plot. Select all the code and create a new script and name the function gbm.plot2. Search for "var.name". Replace any instance where var.name is being changed. Examples:

var.name <- gbm.call$predictor.names[k]
var.name <- x.label 

to this:

var.name <- labels[j]

Now save the script and call it using source(), or run the whole script to get gbm.plot2 into the global environment.

Step 2. Let's pretend our dataframe is called "df" and has 200 columns. Choose the column numbers you want to call in gbm.step.

vars <- c(17, 175, 198)

Step 3. Make a dataframe with two columns: one column will have all the possible variable names you might be interested in using and one with the labels you want to use. Make sure the ColumnNames actually match what you can find in "colnames(df)[vars]".

ColumnNames <- c("HiHorAve", "Elev", "Type5")
Labels <- c("Hi Hello Avenue", "Probably Elevation", "Type 5 of Something")
labels <- data.frame(ColumnNames,Labels)

Now order the labels by the order in which they appear in your dataframe. This is helpful is you have a bunch of variables and your data frame changes shape often.

labels <- labels[match(colnames(df)[vars], labels$ColumnNames),]

Step 4. Run your gbm.step equation like so:

BRTmodel<- gbm.step(data=df, gbm.x=vars, gbm.y = 5, .....)

Step 5. Get the model summary --it orders the variables by relative importance. Then arrange the labels by relative importance.

smry1<- summary(BRTmodel)

labels <- labels[order(match(names(df)[vars],smry1$var))]
labels <- labels$Labels #extract the labels to a vector

Step 6. Now run your new gbm.plot script!

  gbm.plot2(BRTmodel, n.plots=3, y.label="")

It should plot only the nice labels.

这篇关于Y轴在对数刻度上并在gbm.plot中居中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆