如何在 R ggplot 中绘制此行列表的直方图? [英] How to do histograms of this row-column table in R ggplot?
问题描述
我正在尝试通过以下过程绘制第一行中的描述性变量.我也尝试引用列/行名称未成功
- 旋转 CSV 数据中的行和列,以获得线程中所需的相关数据结构(高表)
转置前
dat.m
的数据结构'data.frame': 4 obs.共 5 个变量:$ 绝对值:具有 2 个级别5"、7"的因子:NA NA 1 2..- attr(*, "names")= chr "Sleep" "Awake" "REM" "Deep"$ 平均值:因子 w/2 个级别12",7":2 1 NA NA..- attr(*, "names")= chr "Sleep" "Awake" "REM" "Deep"$ Min : Factor w/2 levels "4"," 5": 1 2 NA NA..- attr(*, "names")= chr "Sleep" "Awake" "REM" "Deep"$ Max : Factor w/2 levels "10","15": 1 2 NA NA..- attr(*, "names")= chr "Sleep" "Awake" "REM" "Deep"$ Vars : chr "Sleep" "Awake" "REM" "Deep"绝对平均 Min Max Vars睡眠<NA>7 4 10 睡眠醒来<NA>12 5 15 醒着REM 5 <NA><不适用><不适用>快速眼动深7<NA><不适用><不适用>深的
转置后
dat.m
的数据结构'data.frame':16 个观察.共 3 个变量:$ Vars : chr "Sleep" "Awake" "REM" "Deep" ...$ 变量:因子 w/4 个级别绝对"、平均"、..:1 1 1 1 2 2 2 2 3 3 ...$值:chr NA NA5"7"...vars 变量值1 睡眠绝对 <NA>2 觉醒绝对<NA>3 REM 绝对 54 深绝对 75 平均睡眠 76 平均清醒 127 REM 平均值 <NA>8 深度平均<NA>9 睡眠 最少 410 分钟 5 分钟11 REM Min <NA>12 Deep Min <NA>13 睡眠 最多 1014 觉醒最大 1515 REM Max <NA>16 Deep Max <NA>
测试 akash87 的
#或者多条ggplot(dat.m, aes(x = Vars, y = value)) +geom_bar(aes(fill=variable), stat = "identity", position="dodge")
#或者用Vars隔开ggplot(dat.m, aes(x = Vars, y = value)) + geom_bar(aes(fill=variable), stat = "identity", position="dodge") + facet_wrap(~ Vars, scales="free")
我正在为答案添加另一个图表.这与@Uwe 的回答合作.
#data数据<-结构(列表(变量=结构(1:2,类=因子",.Label=c(V1",V2")),ave=c(7L,8L),ave_max=c(10L, 10L), lepo = c(4L, 4L)), .Names = c("Vars", "ave", "ave_max", "lepo"), row.names = c(NA, -2L), class= c("data.table", "data.frame"), sorted = "Vars")#熔化图书馆(数据表)mo = data.table::melt(data, measure.vars = c("ave"))ggplot(mo, aes(x = Vars, y = value, fill = variable, ymin = lepo, ymax = ave_max)) + geom_col() + geom_errorbar(width = 0.2)
这将产生:
I am trying to plot the descriptive variables in the first row by the following procedure. I also tried unsuccessfully with quoting the column/row names
- rotate rows and columns in the CSV data for the correposding data structure (tall table) required in the thread A very simple histogram with R? with
ggplot
to plot histogram of events as
Absolute
variable XOR (Average
,Min
,Max
)- If absolute value only, just draw absolute value in histogram.
- If (average, min and max), just draw them in the histogram with whiskers (= whisker plot) where the limits of the whiskers are made by the min and max.
Data
initially,
data.csv
"Vars" , "Sleep", "Awake", "REM", "Deep" "Absolute", , , 5 , 7 "Average" , 7 , 12 , , "Min" , 4 , 5 , , "Max" , 10 , 15 , ,
data after reshaping visually
V1 V2 V3 V4 Vars Absolute Average Min Max Sleep <NA> 7 4 10 Awake <NA> 12 5 15 REM 5 <NA> <NA> <NA> Deep 7 <NA> <NA> <NA>
data after reshaping for R
data <- structure(list(V1 = structure(c(3L, NA, NA, 1L, 2L), .Names = c("Vars", "Sleep", "Awake", "REM", "Deep"), .Label = c(" 5", " 7", "Absolute" ), class = "factor"), V2 = structure(c(3L, 2L, 1L, NA, NA), .Names = c("Vars", "Sleep", "Awake", "REM", "Deep"), .Label = c("12", " 7", "Average " ), class = "factor"), V3 = structure(c(3L, 1L, 2L, NA, NA), .Names = c("Vars", "Sleep", "Awake", "REM", "Deep"), .Label = c(" 4", " 5", "Min " ), class = "factor"), V4 = structure(c(3L, 1L, 2L, NA, NA), .Names = c("Vars", "Sleep", "Awake", "REM", "Deep"), .Label = c("10", "15", "Max " ), class = "factor")), .Names = c("V1", "V2", "V3", "V4"), row.names = c("Vars", "Sleep", "Awake", "REM", "Deep"), class = "data.frame")
R code with debugging code
dat.m <- read.csv("data.csv") # rotate rows and columns dat.m <- as.data.frame(t(dat.m)) # https://stackoverflow.com/a/7342329/54964 Comment 42- library("reshape2") dat.m <- melt(dat.m, id.vars="Vars") ## Just plot values existing there correspondingly library("ggplot2") # https://stackoverflow.com/a/25584792/54964 # TODO following #ggplot(dat.m, aes(x = "Vars", y = value,fill=variable))
Error
Error: id variables not found in data: Vars Execution halted
R: 3.3.3, 3.4.0 (backports)
OS: Debian 8.7
R reshape2, ggplot2, ... withsessionInfo()
after loading the two packagesPlatform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ggplot2_2.1.0 reshape2_1.4.2 loaded via a namespace (and not attached): [1] colorspace_1.3-2 scales_0.4.1 magrittr_1.5 plyr_1.8.4 [5] tools_3.3.3 gtable_0.2.0 Rcpp_0.12.10 stringi_1.1.5 [9] grid_3.3.3 stringr_1.2.0 munsell_0.4.3
Testing HaberdashPI's proposal
Output in Fig. 1 where wrongly absolute value in
Sleep
andAwake
. IfNA
, just set value to zero.Fig. 1 HaberdashPI's proposal output not as expected
Data structure of
dat.m
before the transpose'data.frame': 4 obs. of 5 variables: $ Absolute: Factor w/ 2 levels " 5"," 7": NA NA 1 2 ..- attr(*, "names")= chr "Sleep" "Awake" "REM" "Deep" $ Average : Factor w/ 2 levels "12"," 7": 2 1 NA NA ..- attr(*, "names")= chr "Sleep" "Awake" "REM" "Deep" $ Min : Factor w/ 2 levels " 4"," 5": 1 2 NA NA ..- attr(*, "names")= chr "Sleep" "Awake" "REM" "Deep" $ Max : Factor w/ 2 levels "10","15": 1 2 NA NA ..- attr(*, "names")= chr "Sleep" "Awake" "REM" "Deep" $ Vars : chr "Sleep" "Awake" "REM" "Deep" Absolute Average Min Max Vars Sleep <NA> 7 4 10 Sleep Awake <NA> 12 5 15 Awake REM 5 <NA> <NA> <NA> REM Deep 7 <NA> <NA> <NA> Deep
Data structure of
dat.m
after the transpose'data.frame': 16 obs. of 3 variables: $ Vars : chr "Sleep" "Awake" "REM" "Deep" ... $ variable: Factor w/ 4 levels "Absolute","Average ",..: 1 1 1 1 2 2 2 2 3 3 ... $ value : chr NA NA " 5" " 7" ... Vars variable value 1 Sleep Absolute <NA> 2 Awake Absolute <NA> 3 REM Absolute 5 4 Deep Absolute 7 5 Sleep Average 7 6 Awake Average 12 7 REM Average <NA> 8 Deep Average <NA> 9 Sleep Min 4 10 Awake Min 5 11 REM Min <NA> 12 Deep Min <NA> 13 Sleep Max 10 14 Awake Max 15 15 REM Max <NA> 16 Deep Max <NA>
Testing akash87's proposal
Code
ds <- dat.m str(ds) ds ds$variable ds$variable %in% c("Min","Max")
Wrong output because all
False
in the end$ Vars : chr "Sleep" "Awake" "REM" "Deep" ... $ variable: Factor w/ 4 levels "Absolute","Average ",..: 1 1 1 1 2 2 2 2 3 3 ... $ value : chr NA NA " 5" " 7" ... Vars variable value 1 Sleep Absolute <NA> 2 Awake Absolute <NA> 3 REM Absolute 5 4 Deep Absolute 7 5 Sleep Average 7 6 Awake Average 12 7 REM Average <NA> 8 Deep Average <NA> 9 Sleep Min 4 10 Awake Min 5 11 REM Min <NA> 12 Deep Min <NA> 13 Sleep Max 10 14 Awake Max 15 15 REM Max <NA> 16 Deep Max <NA> [1] "hello 3" [1] Absolute Absolute Absolute Absolute Average Average Average Average [9] Min Min Min Min Max Max Max Max Levels: Absolute Average Min Max [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [13] FALSE FALSE FALSE FALSE
So doing
ds[ds$variable %in% c("Min","Max"), ]
will givenFalse
output because error-carried-forward.Testing Uwe's proposal
Code with explicit
data.table::dcast
and two timesdata.table::melt
. Printing outsessionInfo()
just beforemolten <- ...
. Notelibrary(ggplot2)
is not loaded yet because the error comes from the linemolten <- ...
.$ Rscript test111.r Vars "Average" "Max" "Min" Absolute 1: Sleep 7 10 4 NA 2: Awake 12 15 5 NA 3: REM NA NA NA 5 4: Deep NA NA NA 7 R version 3.4.0 (2017-04-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie) Matrix products: default BLAS: /usr/lib/openblas-base/libblas.so.3 LAPACK: /usr/lib/libopenblasp-r0.2.12.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets base other attached packages: [1] data.table_1.10.4 loaded via a namespace (and not attached): [1] compiler_3.4.0 methods_3.4.0 Error in melt.data.table(transposed, measure.vars = c("Absolute", "Average")) : One or more values in 'measure.vars' is invalid. Calls: <Anonymous> -> melt.data.table Execution halted
Testing Uwe's proposal with test code 2
Code
molten <- structure(list(Vars = structure(c(1L, 2L, 1L, 2L, 1L, 2L), class = "factor", .Label = c("V1", "V2")), variable = structure(c(1L, 1L, 2L, 2L, 3L, 3L), class = "factor", .Label = c("ave", "ave_max", "lepo")), value = c(7L, 8L, 10L, 10L, 4L, 4L)), .Names = c("Vars", "variable", "value"), row.names = c(NA, -6L), class = c("data.table", "data.frame")) print(molten) library(ggplot2) ggplot(molten, aes(x = Vars, y = value, fill = variable, ymin = lepo, ymax = ave_max)) + geom_col() + geom_errorbar(width = 0.2)
Output
Vars variable value 1 V1 ave 7 2 V2 ave 8 3 V1 ave_max 10 4 V2 ave_max 10 5 V1 lepo 4 6 V2 lepo 4 Error in FUN(X[[i]], ...) : object 'lepo' not found Calls: <Anonymous> ... by_layer -> f -> <Anonymous> -> f -> lapply -> FUN -> FUN Execution halted
解决方案The problem with your code is that you used "Vars" with a quote instead of simple Vars in the ggplot aes function. Also, the header of your data set is messed up. The Absolute, Average, ... should be the column names of the data set, not the values themselves. That's why you get the error from melt function.
Given your data set, here is my attempt:
#Data data = cbind.data.frame(c("Sleep", "Awake", "REM", "Deep"), c(NA, NA, 5, 7), c(7, 12, NA, NA), c(4, 5, NA, NA), c(10, 15, NA, NA)) colnames(data) = c("Vars", "Absolute", "Average", "Min", "Max") #reshape dat.m <- melt(data, id.vars="Vars") #Stacked plot ggplot(dat.m, aes(x = Vars, y = value)) + geom_bar(aes(fill=variable), stat = "identity")
This will produce:
#Or multiple bars ggplot(dat.m, aes(x = Vars, y = value)) + geom_bar(aes(fill=variable), stat = "identity", position="dodge")
#Or separated by Vars ggplot(dat.m, aes(x = Vars, y = value)) + geom_bar(aes(fill=variable), stat = "identity", position="dodge") + facet_wrap( ~ Vars, scales="free")
I am adding another graph to the answer. This collaborates @Uwe answer.
#data data <- structure(list(Vars = structure(1:2, class = "factor", .Label = c("V1", "V2")), ave = c(7L, 8L), ave_max = c(10L, 10L), lepo = c(4L, 4L)), .Names = c("Vars", "ave", "ave_max", "lepo"), row.names = c(NA, -2L), class = c("data.table", "data.frame"), sorted = "Vars") #Melt library(data.table) mo = data.table::melt(data, measure.vars = c("ave")) ggplot(mo, aes(x = Vars, y = value, fill = variable, ymin = lepo, ymax = ave_max)) + geom_col() + geom_errorbar(width = 0.2)
This will produce:
这篇关于如何在 R ggplot 中绘制此行列表的直方图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- rotate rows and columns in the CSV data for the correposding data structure (tall table) required in the thread A very simple histogram with R? with