可视化面板数据中两个变量之间的关系 [英] Visualise the relation between two variables in panel data

查看:139
本文介绍了可视化面板数据中两个变量之间的关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我熟悉 R,但不太熟悉绘图.我有如下面板数据:

库(plm)图书馆(dplyr)数据(EmplUK",包=plm")EmplUK<-EmplUK%>%group_by(firm, year) %>%变异(投票=样本(c(0,1),1),Vote_won = ifelse(Vote==1, sample(c(0,1),1),0))# EmplUK <- pdata.frame(EmplUK , index=c(firm", year"), drop.index = FALSE)# 一点点:1,031 x 9# 组:公司,年份 [1,031]公司年部门 emp 工资资本产出投票 Vote_won<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>1 1 1977 7 5.04 13.2 0.589 95.7 1 02 1 1978 7 5.60 12.3 0.632 97.4 0 03 1 1979 7 5.01 12.8 0.677 99.6 1 14 1 1980 7 4.72 13.8 0.617 101. 1 15 1 1981 7 4.09 14.3 0.508 99.6 0 06 1 1982 7 3.17 14.9 0.423 98.6 0 07 1 1983 7 2.94 13.8 0.392 100. 0 08 2 1977 7 71.3 14.8 16.9 95.7 1 09 2 1978 7 70.6 14.1 17.2 97.4 1 110 2 1979 7 70.9 15.0 17.5 99.6 1 1toplot <- plm(输出~工资,数据=EmplUK,模型=内")系数:估计标准误差 t 值 Pr(>|t|)工资 -0.707 0.143 -4.94 0.00000095 ***

我想通过可视化产出和工资之间的关系(可能拟合此类线性、二次、多项式)来评估面板数据中两个变量之间的最佳关系(线性、二次、多项式).然而,我对绘图非常不熟悉.

我正在寻找这样的东西(

我尝试从以下开始:

plot(EmplUK$output,EmplUK$wage,type='l',col='red',main='线性关系')

但这给了我这个:

老实说,我完全不知道我在这里做什么.有没有人可以让我朝着正确的方向前进?

解决方案

我可能会用贬义的数据来做.

demeaned_data <- EmplUK %>%group_by(firm) %>%变异(跨越(c(输出,工资),函数(x)x-均值(x)))ggplot(demeaned_data, aes(x=wage, y=output)) +geom_point() +geom_smooth(aes(颜色=线性",填充=线性"),方法=lm",公式=y ~ x, ) +geom_smooth(aes(颜色=二次",填充=二次"),方法=lm",公式=y ~ x + I(x^2)) +geom_smooth(aes(颜色=立方",填充=立方"),方法=lm",公式=y ~ x + I(x^2) + I(x^3)) +scale_fill_brewer(调色板=Set1")+scale_colour_brewer(palette=Set1") +主题经典() +实验室(颜色=函数形式",填充=函数形式")

另一种方法是使用 OLS 和公司虚拟变量来估计模型,然后您可以获得每个公司的预测并分别绘制它们.

库(ggeffects)数据(EmplUK",包=plm")EmplUK <- EmplUK %>% mutate(firm = as.factor(firm))m1 <- lm(产出 ~ 工资 + 公司,数据=EmplUK)m2 <- lm(输出 ~ 工资 + I(wage^2) + 公司,数据=EmplUK )m3 <- lm(output ~ 工资 + I(wage^2) + I(wage^3) + 公司,数据=EmplUK )p1<-ggpredict(m1,terms=c(wage",firm"))%>%变异(形式=线性")%>%重命名(工资"=x",坚定"=组",输出"=预测")p2<-ggpredict(m2,terms=c(wage",firm"))%>%变异(形式=二次")%>%重命名(工资"=x",坚定"=组",输出"=预测")p3<-ggpredict(m3,terms=c(wage",firm"))%>%变异(形式=立方")%>%重命名(工资"=x",坚定"=组",输出"=预测")ggplot() +geom_line(data=p1, aes(x=wage, y=output, colour=linear")) +geom_line(data=p2, aes(x=wage, y=output, colour=quadratic")) +geom_line(data=p3, aes(x=wage, y=output, colour=cubic")) +geom_point(data=EmplUK, aes(x=wage, y=output)) +facet_wrap(~firm) +theme_bw() +实验室(颜色=功能\n形式")

I am familiar with R, but not very much with plotting. I have panel data as follows:

library(plm)
library(dplyr)
data("EmplUK", package="plm")
EmplUK <- EmplUK %>%
group_by(firm, year) %>%
mutate(Vote = sample(c(0,1),1) ,
     Vote_won = ifelse(Vote==1, sample(c(0,1),1),0))

# EDIT: 

EmplUK <- pdata.frame(EmplUK , index=c("firm", "year"), drop.index = FALSE)

# A tibble: 1,031 x 9
# Groups:   firm, year [1,031]
    firm  year sector   emp  wage capital output  Vote Vote_won
   <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>  <dbl> <dbl>    <dbl>
 1     1  1977      7  5.04  13.2   0.589   95.7     1        0
 2     1  1978      7  5.60  12.3   0.632   97.4     0        0
 3     1  1979      7  5.01  12.8   0.677   99.6     1        1
 4     1  1980      7  4.72  13.8   0.617  101.      1        1
 5     1  1981      7  4.09  14.3   0.508   99.6     0        0
 6     1  1982      7  3.17  14.9   0.423   98.6     0        0
 7     1  1983      7  2.94  13.8   0.392  100.      0        0
 8     2  1977      7 71.3   14.8  16.9     95.7     1        0
 9     2  1978      7 70.6   14.1  17.2     97.4     1        1
10     2  1979      7 70.9   15.0  17.5     99.6     1        1

toplot <- plm(output ~ wage, data=EmplUK, model="within")

Coefficients:
     Estimate Std. Error t-value   Pr(>|t|)    
wage   -0.707      0.143   -4.94 0.00000095 ***

I would like to evaluate what the best relation between two variables in panel data are (linear, quadratic, polynomial) by visualising the relation between output and wage (and perhaps fitting such linear, quadratic, polynomial). I am however super unfamiliar with plotting.

I am looking for something like this (source) (where I get the formula for the fitted line):

I tried starting out as follows:

plot(EmplUK$output,EmplUK$wage,type='l',col='red',main='Linear relationship')

But that gives me this:

In all honesty I have very little idea what I am doing here. Is there anyone who could get me in the right direction?

解决方案

I would probably do it with the de-meaned data.

demeaned_data <- EmplUK %>% 
  group_by(firm) %>% 
  mutate(across(c(output, wage), function(x)x-mean(x)))

ggplot(demeaned_data, aes(x=wage, y=output)) + 
  geom_point() + 
  geom_smooth(aes(colour="linear", fill="linear"), 
              method="lm", 
              formula=y ~ x, ) + 
  geom_smooth(aes(colour="quadratic", fill="quadratic"), 
              method="lm", 
              formula=y ~ x + I(x^2)) + 
  geom_smooth(aes(colour="cubic", fill="cubic"), 
              method="lm", 
              formula=y ~ x + I(x^2) + I(x^3)) + 
  scale_fill_brewer(palette="Set1") + 
  scale_colour_brewer(palette="Set1") + 
  theme_classic() + 
  labs(colour="Functional Form", fill="Functional Form")

An alternative would be to estimate the model with OLS and firm dummy variables and then you could get predictions for each firm and plot them separately.

library(ggeffects)
data("EmplUK", package="plm")
EmplUK <- EmplUK %>% mutate(firm = as.factor(firm))
m1 <- lm(output ~ wage + firm, data=EmplUK )
m2 <- lm(output ~ wage + I(wage^2) + firm, data=EmplUK )
m3 <- lm(output ~ wage + I(wage^2) + I(wage^3) + firm, data=EmplUK )

p1 <- ggpredict(m1, terms=c("wage", "firm")) %>% 
  mutate(form="linear") %>% 
  rename("wage" = "x", 
         "firm" = "group", 
         "output" = "predicted")
p2 <- ggpredict(m2, terms=c("wage", "firm")) %>% 
  mutate(form="quadratic") %>% 
  rename("wage" = "x", 
         "firm" = "group", 
         "output" = "predicted")
p3 <- ggpredict(m3, terms=c("wage", "firm")) %>% 
  mutate(form="cubic") %>% 
  rename("wage" = "x", 
         "firm" = "group", 
         "output" = "predicted")

ggplot() + 
  geom_line(data=p1, aes(x=wage, y=output, colour="linear")) + 
  geom_line(data=p2, aes(x=wage, y=output, colour="quadratic")) + 
  geom_line(data=p3, aes(x=wage, y=output, colour="cubic")) + 
  geom_point(data=EmplUK, aes(x=wage, y=output)) + 
  facet_wrap(~firm) + 
  theme_bw() + 
  labs(colour="Functional\nForm")

这篇关于可视化面板数据中两个变量之间的关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆