可视化R中两个连续变量和一个分类变量之间的三向交互 [英] Visualising a three way interaction between two continuous variables and one categorical variable in R

查看:653
本文介绍了可视化R中两个连续变量和一个分类变量之间的三向交互的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有一个模型,该模型包括两个连续自变量IVContinuousA,IVContinuousB,IVCategorical和一个分类变量(分为两个级别:控制和治疗)之间的显着三向相互作用.因变量是连续(DV).

model <- lm(DV ~ IVContinuousA * IVContinuousB * IVCategorical)

您可以在此处

找到数据

我正在尝试找到一种在R中可视化它的方法,以简化我的解释(也许在ggplot2中?).

此博客文章我以为我可以将IVContinuousB分为高值和低值(所以它本身将是一个两级因素:

IVContinuousBHigh <- mean(IVContinuousB) + sd (IVContinuousB) 
IVContinuousBLow <- mean(IVContinuousB) - sd (IVContinuousB)

然后,我计划绘制DV和IV ContinuousA之间的关系,并绘制代表IVCategorical和我的新二分IVContinuousB的不同组合的这种关系的斜率的拟合线:

IVCategoricalControlIVContinuousBHigh
IVCategoricalControlIVContinuousBLow
IVCategoricalTreatmentIVContinuousBHigh
IVCategoricalTreatmentIVContinuousBLow

我的第一个问题是,这听起来像是一种可行的解决方案,可以产生这种可解释的三方互动情节吗?我想尽可能避免使用3D绘图,因为我不觉得它们直观...还是有另一种解决方法?也许是上面不同组合的刻面图?

如果这是一个好的解决方案,那么我的第二个问题是如何生成数据以预测拟合线以表示上述不同组合?

第三个问题-是否有人对如何在ggplot2中进行编码有任何建议?

我在Cross Validated上发布了一个非常类似的问题,但是因为它与代码相关性更高,所以我想我可以改在这里尝试(如果该简历与社区更相关,我将删除CV帖子:))

非常感谢

萨拉

请注意,DV列中有NA个(空白),并且设计不平衡-变量IVCategorical的对照组"与治疗"组中的数据点数量略有不同.

仅供参考,我有将IVContinuousA与IVCategorical之间双向交互进行验证的代码:

A< -ggplot(data = data,aes(x = AOTAverage,y = SciconC,group = MisinfoCondition,shape = MisinfoCondition,col = MisinfoCondition,))+ geom_point(大小= 2)+ geom_smooth(方法='lm ',formula = y〜x)

但是我想要在IVContinuousB ...上绘制这种关系的条件.

解决方案

以下是用于以二维方式可视化模型输出的几个选项.我在这里假设这里的目标是比较TreatmentControl

library(tidyverse)
  theme_set(theme_classic() +
          theme(panel.background=element_rect(colour="grey40", fill=NA))

dat = read_excel("Some Data.xlsx")  # I downloaded your data file

mod <- lm(DV ~ IVContinuousA * IVContinuousB * IVCategorical, data=dat)

# Function to create prediction grid data frame
make_pred_dat = function(data=dat, nA=20, nB=5) {
  nCat = length(unique(data$IVCategorical))
  d = with(data, 
           data.frame(IVContinuousA=rep(seq(min(IVContinuousA), max(IVContinuousA), length=nA), nB*2),
                      IVContinuousB=rep(rep(seq(min(IVContinuousB), max(IVContinuousB), length=nB), each=nA), nCat),
                      IVCategorical=rep(unique(IVCategorical), each=nA*nB)))

  d$DV = predict(mod, newdata=d)

  return(d)
}

IVContinuousADV的对比,按IVContinuousB

的级别

IVContinuousAIVContinuousB的角色当然可以在这里切换.

ggplot(make_pred_dat(), aes(x=IVContinuousA, y=DV, colour=IVCategorical)) + 
  geom_line() +
  facet_grid(. ~ round(IVContinuousB,2)) +
  ggtitle("IVContinuousA vs. DV, by Level of IVContinousB") +
  labs(colour="")

您可以绘制类似的图而无需多面化,但是随着IVContinuousB级数的增加,它变得难以解释:

ggplot(make_pred_dat(nB=3), 
       aes(x=IVContinuousA, y=DV, colour=IVCategorical, linetype=factor(round(IVContinuousB,2)))) + 
  geom_line() +
  #facet_grid(. ~ round(IVContinuousB,2)) +
  ggtitle("IVContinuousA vs. DV, by Level of IVContinousB") +
  labs(colour="", linetype="IVContinuousB") +
  scale_linetype_manual(values=c("1434","11","62")) +
  guides(linetype=guide_legend(reverse=TRUE))

模型预测差异的热图,DV处理-在IVContinuousAIVContinuousB值的网格上进行DV控制

下面,我们查看在每对IVContinuousAIVContinuousB处治疗和对照之间的区别.

ggplot(make_pred_dat(nA=100, nB=100) %>% 
         group_by(IVContinuousA, IVContinuousB) %>% 
         arrange(IVCategorical) %>% 
         summarise(DV = diff(DV)), 
       aes(x=IVContinuousA, y=IVContinuousB)) + 
  geom_tile(aes(fill=DV)) +
  scale_fill_gradient2(low="red", mid="white", high="blue") +
  labs(fill=expression(Delta*DV~(Treatment - Control)))

I have a model in R that includes a significant three-way interaction between two continuous independent variables IVContinuousA, IVContinuousB, IVCategorical and one categorical variable (with two levels: Control and Treatment). The dependent variable is continuous (DV).

model <- lm(DV ~ IVContinuousA * IVContinuousB * IVCategorical)

You can find the data here

I am trying to find out a way to visualise this in R to ease my interpretation of it (perhaps in ggplot2?).

Somewhat inspired by this blog post I thought that I could dichotomise IVContinuousB into high and low values (so it would be a two-level factor itself:

IVContinuousBHigh <- mean(IVContinuousB) + sd (IVContinuousB) 
IVContinuousBLow <- mean(IVContinuousB) - sd (IVContinuousB)

I then planned to plot the relationship between DV and IV ContinuousA and fit lines representing the slopes of this relationship for different combinations of IVCategorical and my new dichotomised IVContinuousB:

IVCategoricalControl and IVContinuousBHigh
IVCategoricalControl and IVContinuousBLow
IVCategoricalTreatment and IVContinuousBHigh
IVCategoricalTreatment and IVContinuousBLow

My first question is does this sound like a viable solution to producing an interpretable plot of this three-way-interaction? I want to avoid 3D plots if possible as I don't find them intuitive... Or is there another way to go about it? Maybe facet plots for the different combinations above?

If it is an ok solution, my second question is how to I generate the data to predict the fit lines to represent the different combinations above?

Third question - does anyone have any advice as to how to code this up in ggplot2?

I posted a very similar question on Cross Validated but because it is more code related I thought I would try here instead (I will remove the CV post if this one is more relevant to the community :) )

Thanks so much in advance,

Sarah

Note that there are NAs (left as blanks) in the DV column and the design is unbalanced - with slightly different numbers of datapoints in the Control vs Treatment groups of the variable IVCategorical.

FYI I have the code for visaualising a two-way interaction between IVContinuousA and IVCategorical:

A<-ggplot(data=data,aes(x=AOTAverage,y=SciconC,group=MisinfoCondition,shape=MisinfoCondition,col = MisinfoCondition,))+geom_point(size = 2)+geom_smooth(method='lm',formula=y~x)

But what I want is to plot this relationship conditional on IVContinuousB....

解决方案

Here are a couple of options for visualizing the model output in two dimensions. I'm assuming here that the goal here is to compare Treatment to Control

library(tidyverse)
  theme_set(theme_classic() +
          theme(panel.background=element_rect(colour="grey40", fill=NA))

dat = read_excel("Some Data.xlsx")  # I downloaded your data file

mod <- lm(DV ~ IVContinuousA * IVContinuousB * IVCategorical, data=dat)

# Function to create prediction grid data frame
make_pred_dat = function(data=dat, nA=20, nB=5) {
  nCat = length(unique(data$IVCategorical))
  d = with(data, 
           data.frame(IVContinuousA=rep(seq(min(IVContinuousA), max(IVContinuousA), length=nA), nB*2),
                      IVContinuousB=rep(rep(seq(min(IVContinuousB), max(IVContinuousB), length=nB), each=nA), nCat),
                      IVCategorical=rep(unique(IVCategorical), each=nA*nB)))

  d$DV = predict(mod, newdata=d)

  return(d)
}

IVContinuousA vs. DV by levels of IVContinuousB

The roles of IVContinuousA and IVContinuousB can of course be switched here.

ggplot(make_pred_dat(), aes(x=IVContinuousA, y=DV, colour=IVCategorical)) + 
  geom_line() +
  facet_grid(. ~ round(IVContinuousB,2)) +
  ggtitle("IVContinuousA vs. DV, by Level of IVContinousB") +
  labs(colour="")

You can make a similar plot without faceting, but it gets difficult to interpret as the number of IVContinuousB levels increases:

ggplot(make_pred_dat(nB=3), 
       aes(x=IVContinuousA, y=DV, colour=IVCategorical, linetype=factor(round(IVContinuousB,2)))) + 
  geom_line() +
  #facet_grid(. ~ round(IVContinuousB,2)) +
  ggtitle("IVContinuousA vs. DV, by Level of IVContinousB") +
  labs(colour="", linetype="IVContinuousB") +
  scale_linetype_manual(values=c("1434","11","62")) +
  guides(linetype=guide_legend(reverse=TRUE))

Heat map of the model-predicted difference, DV treatment - DV control on a grid of IVContinuousA and IVContinuousB values

Below, we look at the difference between treatment and control at each pair of IVContinuousA and IVContinuousB.

ggplot(make_pred_dat(nA=100, nB=100) %>% 
         group_by(IVContinuousA, IVContinuousB) %>% 
         arrange(IVCategorical) %>% 
         summarise(DV = diff(DV)), 
       aes(x=IVContinuousA, y=IVContinuousB)) + 
  geom_tile(aes(fill=DV)) +
  scale_fill_gradient2(low="red", mid="white", high="blue") +
  labs(fill=expression(Delta*DV~(Treatment - Control)))

这篇关于可视化R中两个连续变量和一个分类变量之间的三向交互的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆