如何处理重叠的因子水平? (例如,制作表格和图表时) [英] How to deal with overlapping factor levels? (e.g. when producing tables and plots)

查看:79
本文介绍了如何处理重叠的因子水平? (例如,制作表格和图表时)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到的数据集具有重叠因子水平的问题。

I am facing a problem with a dataset which has overlapping factor levels.

我想生成时间表,条形图和按因子水平进行统计-但是,我希望因子水平不明确。
这意味着属于一个以上级别的观察应该在图中多次出现。

I would like to produce timelines, barplots and statistics by factor level - however, I want the factor levels to be equivocal. That means that observations belonging to more than one level should appear several times in a plot.

以下是我的数据结构的示例:

Here is an example of how my data structure looks like:

head <- c("ID","YEAR","BRAZIL","GERMANY","US","FRANCE")
data <- data.frame(matrix(c(1,2000,1,0,0,0,
                            2,2010,0,1,1,0,
                            3,2011,0,1,0,0,
                            4,2012,1,0,0,1,
                            5,2012,0,1,0,0,
                            6,2013,0,0,0,1), 
                         nrow=6, ncol=6, byrow=T))
names(data) <- head

不能以通常的方式创建可能的因子变量 COUNTRY 。这将迫使因子水平变得清晰(在我们的案例中,存在4个水平:巴西,德国,美国法国):

Obiously, a possible factor variable "COUNTRY" cannot be created the usual way. It would force factor levels to be clear-cut (in our case there would be 4 levels: Brazil, Germany, US and France):

data$COUNTRY[data$BRAZIL==1 & 
             data$GERMANY==0 & 
             data$US==0 & 
             data$FRANCE==0]  <- "Brazil"
data$COUNTRY[data$BRAZIL==0 & 
             data$GERMANY==1 & 
             data$US==0 & 
             data$FRANCE==0]  <- "Germany"

等...

factor(data$COUNTRY)

但这不是我想要的...

But this is not what, I want...

我的问题是只有在因子水平适当明确的情况下,才可以按因子作图。
我想产生这样的东西:

My problem is that plotting by factor only works if factor levels are properly unambiguous. I would like to produce something like this:

require(ggplot2)
MYPLOT <- qplot(data$YEAR, data$COUNTRY)
MYPLOT + geom_point(aes(size=..count..), stat="bin") + scale_size(range=c(0, 15)) 

具有属于 i 因子水平的观测值出现 i 次的情况

with observations belonging to i factor levels to appear i times in the plot.


  • 如何转换data.frame以获得所需的内容?

  • 我是否应该简单地复制那些属于 i 因素水平 i 倍的观察结果?如果是,我该怎么办?

  • 一种不需要大小写重复的解决方法吗?

  • How should I transform my data.frame in order to get what I desire?
  • Should I simply duplicate those observations belonging to i factor levels i times? If yes, how should I do that?
  • Is a workaround which does not require case duplications?

想法有人吗?

推荐答案

我认为您必须复制这些行以表示每个观察结果。并用0删除任何内容。

I think you have to duplicate those rows to represent each observation. and remove any with 0.

library(reshape2)
d2<-melt(data, id.var=c("ID","YEAR"))
d3<-d2[d2$value!=0,]
library(ggplot2)
qplot(d3$YEAR, d3$variable)

这篇关于如何处理重叠的因子水平? (例如,制作表格和图表时)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆