R中的堆积条形图,比例线超标 [英] Stacked bar plot in R with ratio line overplot
问题描述
我有每行一个观察值的数据:
rm(list = ls(all = TRUE))mydf<-data.frame(kind = sample(c("good","bad"),100,replace = TRUE),var1 = sample(c("yes","no","yes"),100,replace = TRUE),var2 = sample(c("yes","no"),100,replace = TRUE),var3 = sample(c("yes","no"),100,replace = TRUE),var4 = sample(c("yes","no","yes","no","NA"),100,replace = TRUE),var5 = sample(c("yes","no","yes," no," NA),100,replace = TRUE),var6 = sample(c(" yes," no," yes," no," NA),100,replace = TRUE))
我需要:制作一个带有并排条形对的堆叠条形图,每种条形一个(好或坏),显示每种类型有0个是"变数,多少个变数的计数具有1个是"变量,依此类推,对于所有6个变量,最大为是".Y轴=计数,X轴=七个类别(0是vars,1是var等).每个条形图应该是用颜色编码的堆叠条形图,以显示每个变量对条形图总高度的贡献.NA被视为否".另外,画线显示了七个X轴类别中每个类别的计数(好)/计数(坏)的比例
根据您的描述,以下是我了解您正在尝试实现的目标.它包括三个步骤:
- 将所有NA都替换为"no".
- 以行方式总计所有是".
- 实际绘制图形.
所以要解决每个问题.
让我们假设您的数据如下:
mydf<-data.frame(种类= sample(c("good","bad"),100,replace = TRUE),var1 = sample(c("yes","no","yes"),100,replace = TRUE),var2 = sample(c("yes","no"),100,replace = TRUE),var3 = sample(c("yes","no"),100,replace = TRUE),var4 = sample(c("yes","no","yes","no",NA),100,replace = TRUE),var5 = sample(c("yes","no","yes","no",NA),100,replace = TRUE),var6 = sample(c("yes","no","yes","no",NA),100,replace = TRUE))
1
将所有NA替换为"no"将很简单:
mydf [is.na(mydf)]<-否"
这里我们正在搜索data.frame,并使用赋值运算符将所有 na
替换为no.
2
要以逐行方式添加所有内容,我使用了 apply
函数.在apply函数中,您可以使用?apply
确定参数,但总而言之,您(第一个arg)只需指定 data.frame
,(第二个arg)指定方向(行方向为1,列方向为2)(第3个arg)指定要应用于该方向的函数.
mydf $ total.yes<-apply(mydf,1,function(x){return(length(x [x =="yes"]))})
3
最后是情节.制作情节的最简单和美观的方法是使用 ggplot
.通过键入 install.packages("ggplot2")
进行安装.对于条形图,我将参考此[documentation](此处:),如果将其堆叠起来,它将类似于以下内容:
ggplot(mydf,aes(total.yes,fill = kind))+geom_bar()
I have data with one observation per row:
rm(list = ls(all = TRUE))
mydf <- data.frame(kind = sample(c("good", "bad"), 100, replace = TRUE), var1 = sample(c("yes", "no", "yes"), 100, replace = TRUE), var2 = sample(c("yes", "no"), 100, replace = TRUE), var3 = sample(c( "yes", "no"), 100, replace = TRUE), var4 = sample(c( "yes", "no", "yes", "no", "NA"), 100, replace = TRUE), var5 = sample(c( "yes", "no", "yes", "no", "NA"), 100, replace = TRUE), var6 = sample(c( "yes", "no", "yes", "no", "NA"), 100, replace = TRUE))
I need to: make a stacked bar chart with side-by-side bar pairs, one bar for each kind (good vs bad), showing the count of how many of each kind have 0 "yes" vars, how many have 1 "yes" var, etc., up to "yes" for all 6 vars. Y-axis = count, X-axis = the seven categories (0 yes vars, 1 yes var, etc). Each bar should be a stacked bar color-coded showing the contribution of each var to the total height of the bar. NAs are treated as "no". Also, overplot line showing the ratio of count(good)/count(bad) for each of the seven X-axis categories
Based on your description, here's what I understand what you're trying to achieve. It consists of three steps:
- Replace all NA's with "no".
- Add up all the "yes" in a row-wise manner.
- Actually plotting the graph.
So address each point.
Lets assume that your data is as follows:
mydf <- data.frame(kind = sample(c("good", "bad"), 100, replace = TRUE),
var1 = sample(c("yes", "no", "yes"), 100, replace = TRUE),
var2 = sample(c("yes", "no"), 100, replace = TRUE),
var3 = sample(c( "yes", "no"), 100, replace = TRUE),
var4 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE),
var5 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE),
var6 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE))
1
To replace all NA's with "no" would simply be:
mydf[is.na(mydf)] <- "no"
here we are searching through the data.frame and replace all na
with no's using the assignment operator.
2
To add everything in a row-wise manner I used the apply
function. Within the apply function you can use ?apply
to determine the arguments, but in a nutshell, you (1st arg) simply specify the data.frame
, (2nd arg) specify the direction, 1, for row-wise and 2 for column-wise, (3rd arg) specify the function you wish to apply to the direction.
mydf$total.yes <- apply(mydf, 1, function(x) {
return(length(x[x=="yes"]))
})
3
Lastly the plot. The easiest and aesthetic way to produce plot is to use ggplot
. Install it by typeing install.packages("ggplot2")
. For the bar plots I will refer to this [documentation](here: http://docs.ggplot2.org/0.9.3.1/geom_bar.html), otherwise the code would look like the following.
library(ggplot2)
ggplot(mydf, aes(total.yes, fill=kind)) +
geom_bar(position="dodge")
which will produce the plot below:
I hope this answers the questions you were after. The full code is as follows:
mydf <- data.frame(kind = sample(c("good", "bad"), 100, replace = TRUE),
var1 = sample(c("yes", "no", "yes"), 100, replace = TRUE),
var2 = sample(c("yes", "no"), 100, replace = TRUE),
var3 = sample(c( "yes", "no"), 100, replace = TRUE),
var4 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE),
var5 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE),
var6 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE))
library(ggplot2)
# replace all NA values to no, this step seems redundant because you're only
# counting yes's
mydf[is.na(mydf)] <- "no"
# for each row figure out how many "yes" there are...
mydf$total.yes <- apply(mydf, 1, function(x) {
return(length(x[x=="yes"]))
})
# see example here: http://docs.ggplot2.org/0.9.3.1/geom_bar.html
#using your data
ggplot(mydf, aes(total.yes, fill=kind)) +
geom_bar(position="dodge")
geom_bar
is actually stacked by default, (see [documentation](here: http://docs.ggplot2.org/0.9.3.1/geom_bar.html), if it is stacked it will look something like the following:
ggplot(mydf, aes(total.yes, fill=kind)) +
geom_bar()
这篇关于R中的堆积条形图,比例线超标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!