如何处理重叠的因子水平? (例如,制作表格和图表时) [英] How to deal with overlapping factor levels? (e.g. when producing tables and plots)
问题描述
我遇到的数据集具有重叠因子水平的问题。
I am facing a problem with a dataset which has overlapping factor levels.
我想生成时间表,条形图和按因子水平进行统计-但是,我希望因子水平不明确。
这意味着属于一个以上级别的观察应该在图中多次出现。
I would like to produce timelines, barplots and statistics by factor level - however, I want the factor levels to be equivocal. That means that observations belonging to more than one level should appear several times in a plot.
以下是我的数据结构的示例:
Here is an example of how my data structure looks like:
head <- c("ID","YEAR","BRAZIL","GERMANY","US","FRANCE")
data <- data.frame(matrix(c(1,2000,1,0,0,0,
2,2010,0,1,1,0,
3,2011,0,1,0,0,
4,2012,1,0,0,1,
5,2012,0,1,0,0,
6,2013,0,0,0,1),
nrow=6, ncol=6, byrow=T))
names(data) <- head
不能以通常的方式创建可能的因子变量 COUNTRY
。这将迫使因子水平变得清晰(在我们的案例中,存在4个水平:巴西,德国,美国和法国):
Obiously, a possible factor variable "COUNTRY"
cannot be created the usual way. It would force factor levels to be clear-cut (in our case there would be 4 levels: Brazil, Germany, US and France):
data$COUNTRY[data$BRAZIL==1 &
data$GERMANY==0 &
data$US==0 &
data$FRANCE==0] <- "Brazil"
data$COUNTRY[data$BRAZIL==0 &
data$GERMANY==1 &
data$US==0 &
data$FRANCE==0] <- "Germany"
等...
factor(data$COUNTRY)
但这不是我想要的...
But this is not what, I want...
我的问题是只有在因子水平适当明确的情况下,才可以按因子作图。
我想产生这样的东西:
My problem is that plotting by factor only works if factor levels are properly unambiguous. I would like to produce something like this:
require(ggplot2)
MYPLOT <- qplot(data$YEAR, data$COUNTRY)
MYPLOT + geom_point(aes(size=..count..), stat="bin") + scale_size(range=c(0, 15))
具有属于 i 因子水平的观测值出现 i 次的情况
with observations belonging to i factor levels to appear i times in the plot.
- 如何转换data.frame以获得所需的内容?
- 我是否应该简单地复制那些属于 i 因素水平 i 倍的观察结果?如果是,我该怎么办?
- 一种不需要大小写重复的解决方法吗?
- How should I transform my data.frame in order to get what I desire?
- Should I simply duplicate those observations belonging to i factor levels i times? If yes, how should I do that?
- Is a workaround which does not require case duplications?
想法有人吗?
推荐答案
我认为您必须复制这些行以表示每个观察结果。并用0删除任何内容。
I think you have to duplicate those rows to represent each observation. and remove any with 0.
library(reshape2)
d2<-melt(data, id.var=c("ID","YEAR"))
d3<-d2[d2$value!=0,]
library(ggplot2)
qplot(d3$YEAR, d3$variable)
这篇关于如何处理重叠的因子水平? (例如,制作表格和图表时)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!