在R中绘制一个双变量到多个因子 [英] Plotting a bivariate to multiple factors in R

查看:102
本文介绍了在R中绘制一个双变量到多个因子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我还是个初学者。我试图用R解释并绘制一个堆栈条图。我已经看了一些答案,但有些不是特定于我的案例,有的我根本不明白:




  • ,但基本上我不知道如何堆积,分组的条形图。 ggplot2 可以使用,但是如果可以的话,我可以不使用它。



    我认为这可以被视为一个样本数据集,虽然我不完全确定。

      t < -  data.frame(Variant = sample) (c(iedere,elke),size = 50,replace = TRUE),
    Region = sample(c(VL,NL),size = 50,replace = TRUE),
    PrecededByPrep = sample(c(1,0),size = 50,replace = TRUE),
    Person = sample(c(person,no person),size = 50 ,replace = TRUE),
    Time = sample(c(time,no time),size = 50,replace = TRUE))

    我想让情节美观。我想到的是:


    • 绘制颜色(即条形图): col = c(paleturquoise3 ,palegreen3)

    • 轴标签的粗体字 font.lab = 2 但< (例如, , ) 以粗体显示)

    • #404040 作为字体,轴线和线条的颜色
    • 轴标签:x:因素,y:频率


    解决方案

    这是一种可能性,它以'un-tabulated'数据框开始, melt ,用 geom_bar ggplot2 (它对每个组进行计数)中绘制它,将变量分开使用 facet_wrap



    创建玩具数据:

     set.seed(123)
    df < - data.frame(Variant = sample(c(iedere,elke),size = 50,replace = TRUE) ,
    Region = sample( c(VL,NL),size = 50,replace = TRUE),
    PrecededByPrep = sample(c(1,0),size = 50,replace = TRUE),
    Person = sample(c(person,no person),size = 50,replace = TRUE),
    Time = sample(c(time,no time),size = 50 ,replace = TRUE))

    重塑数据:

      library(reshape2)
    df2 < - melt(df,id.vars =Variant)
    $ b pre
    $ b pre $库$ g $ p $ b $ ggplot (data = df2,aes(factor(value),fill = Variant))+
    geom_bar()+
    facet_wrap(〜variable,nrow = 1,scales =free_x)+
    scale_fill_grey(start = 0.5)+
    theme_bw()



    定制绘图的机会很多,例如。在这里我使用 dplyr 来计算每栏的计数(即 label > geom_text )和它们的 y 坐标,但这当然可以在 base R, plyr data.table

     #计算计数(即geom_text的标签)及其y位置。 
    library(dplyr)
    df3< - df2%>%
    group_by(variable,value,Variant)%>%
    summary(n = n())% >%
    mutate(y = cumsum(n) - (0.5 * n))

    #plot
    ggplot(data = df2,aes(x = factor(value) ,fill = Variant))+
    geom_bar()+
    geom_text(data = df3,aes(y = y,label = n))+
    facet_grid(〜variable,scales =free_x ,labeller = my_lab)+
    scale_fill_manual(values = c(paleturquoise3,palegreen3))+#手动填充颜色
    theme_bw()+
    theme(axis.text = element_text (face =bold),#轴刻度标签加粗
    axis.text.x = element_text(angle = 45,hjust = 1),#旋转x轴标签
    line = element_line(color =灰色25),#线条颜色gray25 =#404040
    strip.text = element_text(face =bold))+#facet labels bold
    xlab(factors)+#set axis labels
    ylab(frequency)


    First of all, I'm still a beginner. I'm trying to interpret and draw a stack bar plot with R. I already took a look at a number of answers but some were not specific to my case and others I simply didn't understand:

    I've got a dataset dvl that has five columns, Variant, Region, Time, Person and PrecededByPrep. I'd like to make a multivariate comparison of Variant to the other four predictors. Every column can have one of two possible values:

    • Variant: elk or ieder.
    • Region = VL or NL.
    • Time: time or no time
    • Person: person or no person
    • PrecededByPrep: 1 or 0

    Here's the logistic regression

    From the answers I gathered that the library ggplot2 might be the best drawing library to go with. I've read its documentation but for the life of me I can't figure out how to plot this: how can I get a comparison of Variant with the other three factors?

    It took me a while, but I made something similar in Photoshop to what I'd like (fictional values!).

    Dark gray/light gray: possible values of Variant y-axis: frequency x-axis: every column, subdivided into its possible values

    I know to make individual bar plots, both stacked and grouped, but basically I do not know how to have stacked, grouped bar plots. ggplot2 can be used, but if it can be done without I'd prefer that.

    I think this can be seen as a sample dataset, though I'm not entirely sure. I am a beginner with R and I read about creating a sample set.

    t <- data.frame(Variant = sample(c("iedere","elke"),size = 50, replace = TRUE),
                Region = sample(c("VL","NL"),size = 50, replace = TRUE),
                PrecededByPrep = sample(c("1","0"),size = 50, replace = TRUE),
                Person = sample(c("person","no person"),size = 50, replace = TRUE),
                Time = sample(c("time","no time"),size = 50, replace = TRUE))
    

    I'd like to have the plot to be aesthetically pleasing as well. What I had in mind:

    • Plot colours (i.e. for the bars): col=c("paleturquoise3", "palegreen3")
    • A bold font for the axis labels font.lab=2 but not for the value labels (e.g. ´regionin bold, butVLandNL` not in bold)
    • #404040 as a colour for the font, axis and lines
    • Labels for the axes: x: factors, y: frequency

    解决方案

    Here is one possibility which starts with the 'un-tabulated' data frame, melt it, plot it with geom_bar in ggplot2 (which does the counting per group), separate the plot by variable by using facet_wrap.

    Create toy data:

    set.seed(123)
    df <- data.frame(Variant = sample(c("iedere", "elke"), size = 50, replace = TRUE),
               Region = sample(c("VL", "NL"), size = 50, replace = TRUE),
               PrecededByPrep = sample(c("1", "0"), size = 50, replace = TRUE),
               Person = sample(c("person", "no person"), size = 50, replace = TRUE),
               Time = sample(c("time", "no time"), size = 50, replace = TRUE))
    

    Reshape data:

    library(reshape2)
    df2 <- melt(df, id.vars = "Variant")
    

    Plot:

    library(ggplot2)
    ggplot(data = df2, aes(factor(value), fill = Variant)) +
      geom_bar() +
      facet_wrap(~variable, nrow = 1, scales = "free_x") +
      scale_fill_grey(start = 0.5) +
      theme_bw()
    

    There are lots of opportunities to customize the plot, such as setting order of factor levels, rotating axis labels, wrapping facet labels on two lines (e.g. for the longer variable name "PrecededByPrep"), or changing spacing between facets.

    Customization (following updates in question and comments by OP)

    # labeller function used in facet_grid to wrap "PrecededByPrep" on two lines
    # see http://www.cookbook-r.com/Graphs/Facets_%28ggplot2%29/#modifying-facet-label-text
    my_lab <- function(var, value){
      value <- as.character(value)
        if (var == "variable") { 
          ifelse(value == "PrecededByPrep", "Preceded\nByPrep", value)
        }
    }
    
    ggplot(data = df2, aes(factor(value), fill = Variant)) +
      geom_bar() +
      facet_grid(~variable, scales = "free_x", labeller = my_lab) + 
      scale_fill_manual(values = c("paleturquoise3", "palegreen3")) + # manual fill colors
      theme_bw() +
      theme(axis.text = element_text(face = "bold"), # axis tick labels bold 
            axis.text.x = element_text(angle = 45, hjust = 1), # rotate x axis labels
            line = element_line(colour = "gray25"), # line colour gray25 = #404040
            strip.text = element_text(face = "bold")) + # facet labels bold  
      xlab("factors") + # set axis labels
      ylab("frequency")
    

    Add counts to each bar (edit following comments from OP).

    The basic principles to calculate the y coordinates can be found in this Q&A. Here I use dplyr to calculate counts per bar (i.e. label in geom_text) and their y coordinates, but this could of course be done in base R, plyr or data.table.

    # calculate counts (i.e. labels for geom_text) and their y positions.
    library(dplyr)
    df3 <- df2 %>%
      group_by(variable, value, Variant) %>%
      summarise(n = n()) %>%
      mutate(y = cumsum(n) - (0.5 * n))
    
    # plot
    ggplot(data = df2, aes(x = factor(value), fill = Variant)) +
      geom_bar() +
      geom_text(data = df3, aes(y = y, label = n)) +
      facet_grid(~variable, scales = "free_x", labeller = my_lab) + 
      scale_fill_manual(values = c("paleturquoise3", "palegreen3")) + # manual fill colors
      theme_bw() +
      theme(axis.text = element_text(face = "bold"), # axis tick labels bold 
            axis.text.x = element_text(angle = 45, hjust = 1), # rotate x axis labels
            line = element_line(colour = "gray25"), # line colour gray25 = #404040
            strip.text = element_text(face = "bold")) + # facet labels bold  
      xlab("factors") + # set axis labels
      ylab("frequency")
    

    这篇关于在R中绘制一个双变量到多个因子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆