如何使用R晶格重塑堆叠条形图的数据 [英] How to reshape data for a stacked barchart using R lattice
问题描述
我的表(从csv导入)中有以下格式的一堆数据:
I have a bunch of data in a table (imported from csv) in the following format:
date classes score
9/1/11 french 34
9/1/11 english 34
9/1/11 french 34
9/1/11 spanish 34
9/2/11 french 34
9/2/11 english 34
9/3/11 spanish 34
9/3/11 spanish 34
9/5/11 spanish 34
9/5/11 english 34
9/5/11 french 34
9/5/11 english 34
忽略分数列,这并不重要.
Ignore the score column, it's not important.
我需要根据日期来总计英语,西班牙语或法语课程的学生总数.我需要先按日期将其分组,然后根据语言将每一天分成更多的块,并将其绘制成堆积的条形图,因此如下所示.每个条形表示日期,并且每个条形截面表示一种语言.
I need a tally of the total number of students taking English or Spanish or french class based on date, ie. I need to first group it by date and then divide each day into further blocks based on language and plot it as a stacked bar chart so it looks like the following. Each bar represents a date and each cross section of a bar represents a single language.
一旦我以矩阵形式获取数据(每行代表一个日期,每列代表一个属性(或语言))后,我就想出了解决方法.所以我假设数据在csv中是这种形式的:
I've figured out how to do this once I get the data in a matrix form where each row represents a date and every column an attribute (or language). So I assuming the data is in that form in a csv:
ie french english spanish
9/1/11 2 1 1
9/2/11 1 1 0
9/3/11 0 0 2
9/5/11 1 2 1
然后我可以做:
directory<-"C:\\test\\language.csv"
ourdata6<-read.csv(directory)
language<-as.matrix(ourdata6)
barchart(prop.table(language), horizontal=FALSE, auto.key = list(space='right',cex=.5,border=T,points=F, lines=F,lwd=5,text=c('french','spanish','enligsh'),cex=.6), main = list(label="Distribution of classes 10",cex=2.5), ylab = list(", cex=1.7),xlab.top=list("testing",cex=1.2))
面临的挑战是将数据从原始格式转换为我需要的格式.
The challenge is to get the data from the original format into the format I need.
我尝试了
a<-count(language, c("date", "classes"))
它给出了按两者排序但垂直的计数形式的计数
where it gives me the counts sorted by both but its in a vertical form
ie
9/1/11 french 2
9/1/11 english 1
9/1/11 spanish 1
etc...
我需要对此进行透视,以便每个日期将其变为一行.另外,如果其中一些可能为零,那么我需要它们的占位符,即.为了使我当前的设置正常运行,第一列必须对应于法语,第二列必须对应于英语.
I need to pivot this so it becomes a single row per date. Also if some of these might be zero so I need placeholders for them ie. the first column must correspond to french, the second must correspond to english for my current setup to work.
关于如何执行此操作的任何想法,或者我使用matrix + prop.table的方法是否正确?有没有更简单的方法可以做到这一点?
Any ideas on how to do this or if my approach with matrix + prop.table is even correct? Are there any simpler ways of doing this?
推荐答案
假设数据位于名为df
的数据帧中,则可以借助dplyr
和tidyr
软件包来做到这一点:
Supposing your data is in a dataframe called df
, you can do that with the help of the dplyr
and tidyr
packages:
library(dplyr)
library(tidyr)
wide <- df %>% select(date,classes) %>%
group_by(date,classes) %>%
summarise(n=n()) %>% # as @akrun said, you can also use tally()
spread(classes, n, fill=0)
使用您提供的示例数据,将得到以下数据框:
Using the example data you provided, this results in the following dataframe:
date english french spanish
9/1/11 1 2 1
9/2/11 1 1 0
9/3/11 0 0 2
9/5/11 2 1 1
现在,您可以使用以下方法制作lattice
图:
Now you can make a lattice
plot with:
barchart(date ~ english + french + spanish, data=wide, stack = TRUE,
main = list(label="Distribution of language classes",cex=1.6),
xlab = list("Number of classes", cex=1.1),
ylab = list("Date", cex=1.1),
auto.key = list(space='right',cex=1.2,text=c('Enligsh','French','Spanish')))
给出以下图:
除了使用晶格图,您还可以使用ggplot2
,它(至少在我看来)更容易理解.一个例子:
Instead of using lattice-plots, you can also use ggplot2
, which is (at least in my opinion) easier to understand. An example:
# convert the wide dataframe to a long one
long <- wide %>% gather(class, n, -date)
# load ggplot2
library(ggplot2)
# create the plot
ggplot(long, aes(date, n, fill=class)) +
geom_bar(stat="identity", position="stack") +
coord_flip() +
theme_bw() +
theme(axis.title=element_blank(), axis.text=element_text(size=12))
给出:
这篇关于如何使用R晶格重塑堆叠条形图的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!