如何使用R晶格重塑堆叠条形图的数据 [英] How to reshape data for a stacked barchart using R lattice

查看:109
本文介绍了如何使用R晶格重塑堆叠条形图的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的表(从csv导入)中有以下格式的一堆数据:

I have a bunch of data in a table (imported from csv) in the following format:

date        classes         score
9/1/11       french          34
9/1/11       english         34
9/1/11       french          34
9/1/11       spanish         34
9/2/11       french          34
9/2/11       english         34
9/3/11       spanish         34
9/3/11       spanish         34
9/5/11       spanish         34
9/5/11       english         34
9/5/11       french          34
9/5/11       english         34

忽略分数列,这并不重要.

Ignore the score column, it's not important.

我需要根据日期来总计英语,西班牙语或法语课程的学生总数.我需要先按日期将其分组,然后根据语言将每一天分成更多的块,并将其绘制成堆积的条形图,因此如下所示.每个条形表示日期,并且每个条形截面表示一种语言.

I need a tally of the total number of students taking English or Spanish or french class based on date, ie. I need to first group it by date and then divide each day into further blocks based on language and plot it as a stacked bar chart so it looks like the following. Each bar represents a date and each cross section of a bar represents a single language.

一旦我以矩阵形式获取数据(每行代表一个日期,每列代表一个属性(或语言))后,我就想出了解决方法.所以我假设数据在csv中是这种形式的:

I've figured out how to do this once I get the data in a matrix form where each row represents a date and every column an attribute (or language). So I assuming the data is in that form in a csv:

ie           french      english       spanish
9/1/11       2           1             1
9/2/11       1           1             0          
9/3/11       0           0             2
9/5/11       1           2             1

然后我可以做:

directory<-"C:\\test\\language.csv"
ourdata6<-read.csv(directory)

language<-as.matrix(ourdata6)

barchart(prop.table(language), horizontal=FALSE, auto.key = list(space='right',cex=.5,border=T,points=F, lines=F,lwd=5,text=c('french','spanish','enligsh'),cex=.6), main = list(label="Distribution of classes 10",cex=2.5),  ylab = list(", cex=1.7),xlab.top=list("testing",cex=1.2))

面临的挑战是将数据从原始格式转换为我需要的格式.

The challenge is to get the data from the original format into the format I need.

我尝试了

a<-count(language, c("date", "classes"))

它给出了按两者排序但垂直的计数形式的计数

where it gives me the counts sorted by both but its in a vertical form

ie
9/1/11       french           2             
9/1/11       english          1                       
9/1/11       spanish          1            
etc...

我需要对此进行透视,以便每个日期将其变为一行.另外,如果其中一些可能为零,那么我需要它们的占位符,即.为了使我当前的设置正常运行,第一列必须对应于法语,第二列必须对应于英语.

I need to pivot this so it becomes a single row per date. Also if some of these might be zero so I need placeholders for them ie. the first column must correspond to french, the second must correspond to english for my current setup to work.

关于如何执行此操作的任何想法,或者我使用matrix + prop.table的方法是否正确?有没有更简单的方法可以做到这一点?

Any ideas on how to do this or if my approach with matrix + prop.table is even correct? Are there any simpler ways of doing this?

推荐答案

假设数据位于名为df的数据帧中,则可以借助dplyrtidyr软件包来做到这一点:

Supposing your data is in a dataframe called df, you can do that with the help of the dplyr and tidyr packages:

library(dplyr)
library(tidyr)

wide <- df %>% select(date,classes) %>%
  group_by(date,classes) %>%
  summarise(n=n()) %>%            # as @akrun said, you can also use tally()
  spread(classes, n, fill=0)

使用您提供的示例数据,将得到以下数据框:

Using the example data you provided, this results in the following dataframe:

  date english french spanish
9/1/11       1      2       1
9/2/11       1      1       0
9/3/11       0      0       2
9/5/11       2      1       1

现在,您可以使用以下方法制作lattice图:

Now you can make a lattice plot with:

barchart(date ~ english + french + spanish, data=wide, stack = TRUE,
         main = list(label="Distribution of language classes",cex=1.6),
         xlab = list("Number of classes", cex=1.1),
         ylab = list("Date", cex=1.1),
         auto.key = list(space='right',cex=1.2,text=c('Enligsh','French','Spanish')))

给出以下图:

除了使用晶格图,您还可以使用ggplot2,它(至少在我看来)更容易理解.一个例子:

Instead of using lattice-plots, you can also use ggplot2, which is (at least in my opinion) easier to understand. An example:

# convert the wide dataframe to a long one
long <- wide %>% gather(class, n, -date)

# load ggplot2
library(ggplot2)

# create the plot
ggplot(long, aes(date, n, fill=class)) +
  geom_bar(stat="identity", position="stack") +
  coord_flip() +
  theme_bw() +
  theme(axis.title=element_blank(), axis.text=element_text(size=12))

给出:

这篇关于如何使用R晶格重塑堆叠条形图的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆