ggplot:多列值的Boxplot [英] ggplot: Boxplot of multiple column values

查看:590
本文介绍了ggplot:多列值的Boxplot的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是我作为csv文件导入的数据类型:

  RPID mm ID时间频率频率。 1 Freq.2 
RPO483 1 B6AC 5 23301 30512
RPO483 1 B6AC 25 19 17
RPO244 1 B6C 5 14889 20461
RPO244 1 B6C 25 81 86
RPO876 1 G3G3A 5 106760 59950 103745
RPO876 1 G3G3A 25 4578 38119 37201
RPO876 7 F3G3A 5 205803 148469 173580
RPO876 7 F3G3A 25 28648 30321 26454
RPO939 7 F3E324A 5 242285
RPO939 7 F3E324A 25 42837
RPO934 7 F3E325A 5 242001 129272 112371
RPO934 7 F3E325A 25 73057 58685 66582

对于每个ID,我想为Freq,Freq.1和Freq.2列中的值生成箱形图。然而,目前我只能成功绘制一个Y值 - 例如:

pre $ dataset< - read.csv (〜/ R / dataset.csv)
库(ggplot2)
p < - ggplot(dataset)
p + geom_boxplot(aes(x = ID,y = Freq,color = mm ))

我尝试过类似y = c(Freq,Freq.1,Freq.2 ),但结果如下:

 错误:美学必须是长度为1或与dataProblems的长度相同: ID 

我确信有一个简单的解决方案,但是因为我对R很新颖,如果是数据格式错误,语法错误,包装错误或其他问题,我无法告诉


任何帮助都会很大赞赏!

解决方案

您需要重塑数据才能绘制。

首先我读取您的数据。请注意,您有一些 NA 值。

  dat < read.table(text ='
RPID mm ID时间频率频率1频率2
RPO483 1 B6AC 5 23301 30512
RPO483 1 B6AC 25 19 17
RPO244 1 B6C 5 14889 20461
RPO244 1 B6C 25 81 86
RPO876 1 G3G3A 5 106760 59950 103745
RPO876 1 G3G3A 25 4578 38119 37201
RPO876 7 F3G3A 5 205803 148469 173580
RPO876 7 F3G3A 25 28648 30321 26454
RPO939 7 F3E324A 5 242285
RPO939 7 F3E324A 25 42837
RPO934 7 F3E325A 5 242001 129272 112371
RPO934 7 F3E325A 25 73057 58685 66582',head = T,fill = T)

使用 reshape2 例如

  library(reshape2)
dat.m< - melt(dat,id.vars ='ID ',measure.vars = c('Freq','Freq.1','Freq.2'))
library(ggplot2)
p < - ggplot(dat.m)+
geom_boxplot(aes(x = ID,y = value,color = variable))


Here is the type of data that I'm importing as a csv file:

RPID    mm  ID  Time    Freq    Freq.1  Freq.2
RPO483  1   B6AC    5   23301   30512   
RPO483  1   B6AC    25  19      17  
RPO244  1   B6C     5   14889   20461   
RPO244  1   B6C     25  81      86  
RPO876  1   G3G3A   5   106760  59950   103745
RPO876  1   G3G3A   25  4578    38119   37201
RPO876  7   F3G3A   5   205803  148469  173580
RPO876  7   F3G3A   25  28648   30321   26454
RPO939  7   F3E324A 5   242285      
RPO939  7   F3E324A 25  42837       
RPO934  7   F3E325A 5   242001  129272  112371
RPO934  7   F3E325A 25  73057   58685   66582

For each "ID", I'd like to generate a boxplot for values in columns "Freq", "Freq.1" and "Freq.2". However, currently I'm only able to successfully plot one Y value -- for example:

dataset <- read.csv("~/R/dataset.csv")
library(ggplot2)
p <- ggplot(dataset) 
p + geom_boxplot(aes(x=ID, y=Freq, color=mm))

I've tried something like y=c(Freq,Freq.1,Freq.2), but this results in the following:

Error: Aesthetics must either be length one, or the same length as the dataProblems:ID

I'm sure there is a simple solution to this, but as I am very new to R, I can't tell if it is a problem of wrong data format, wrong syntax, wrong package or something else entirely.

Any help would be greatly appreciated !

解决方案

You need to reshape the data in order to plot.

First I read your data. Note that you have some NA values.

dat <- read.table(text = '
RPID    mm  ID  Time    Freq    Freq.1  Freq.2
RPO483  1   B6AC    5   23301   30512   
RPO483  1   B6AC    25  19      17  
RPO244  1   B6C     5   14889   20461   
RPO244  1   B6C     25  81      86  
RPO876  1   G3G3A   5   106760  59950   103745
RPO876  1   G3G3A   25  4578    38119   37201
RPO876  7   F3G3A   5   205803  148469  173580
RPO876  7   F3G3A   25  28648   30321   26454
RPO939  7   F3E324A 5   242285      
RPO939  7   F3E324A 25  42837       
RPO934  7   F3E325A 5   242001  129272  112371
RPO934  7   F3E325A 25  73057   58685   66582',head=T, fill=T)

Using reshape2 for example

library(reshape2)
dat.m <- melt(dat,id.vars='ID', measure.vars=c('Freq','Freq.1','Freq.2'))
library(ggplot2)
p <- ggplot(dat.m) +
      geom_boxplot(aes(x=ID, y=value, color=variable))

这篇关于ggplot:多列值的Boxplot的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆