如何处理重叠的因子水平？（例如，制作表格和图表时） [英] How to deal with overlapping factor levels? (e.g. when producing tables and plots)

查看：79 发布时间：2020/10/17 0:29:56 r dataframe ggplot2 r-factor

本文介绍了如何处理重叠的因子水平？（例如，制作表格和图表时）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我遇到的数据集具有重叠因子水平的问题。

I am facing a problem with a dataset which has overlapping factor levels.

我想生成时间表，条形图和按因子水平进行统计-但是，我希望因子水平不明确。
这意味着属于一个以上级别的观察应该在图中多次出现。

I would like to produce timelines, barplots and statistics by factor level - however, I want the factor levels to be equivocal. That means that observations belonging to more than one level should appear several times in a plot.

以下是我的数据结构的示例：

Here is an example of how my data structure looks like:

head <- c("ID","YEAR","BRAZIL","GERMANY","US","FRANCE")
data <- data.frame(matrix(c(1,2000,1,0,0,0,
                            2,2010,0,1,1,0,
                            3,2011,0,1,0,0,
                            4,2012,1,0,0,1,
                            5,2012,0,1,0,0,
                            6,2013,0,0,0,1), 
                         nrow=6, ncol=6, byrow=T))
names(data) <- head

不能以通常的方式创建可能的因子变量 COUNTRY 。这将迫使因子水平变得清晰（在我们的案例中，存在4个水平：巴西，德国，美国和法国）：

Obiously, a possible factor variable "COUNTRY" cannot be created the usual way. It would force factor levels to be clear-cut (in our case there would be 4 levels: Brazil, Germany, US and France):

data$COUNTRY[data$BRAZIL==1 & 
             data$GERMANY==0 & 
             data$US==0 & 
             data$FRANCE==0]  <- "Brazil"
data$COUNTRY[data$BRAZIL==0 & 
             data$GERMANY==1 & 
             data$US==0 & 
             data$FRANCE==0]  <- "Germany"

等...

factor(data$COUNTRY)

但这不是我想要的...

But this is not what, I want...

我的问题是只有在因子水平适当明确的情况下，才可以按因子作图。
我想产生这样的东西：

My problem is that plotting by factor only works if factor levels are properly unambiguous. I would like to produce something like this:

require(ggplot2)
MYPLOT <- qplot(data$YEAR, data$COUNTRY)
MYPLOT + geom_point(aes(size=..count..), stat="bin") + scale_size(range=c(0, 15))

具有属于 i 因子水平的观测值出现 i 次的情况

with observations belonging to i factor levels to appear i times in the plot.

如何转换data.frame以获得所需的内容？

我是否应该简单地复制那些属于 i 因素水平 i 倍的观察结果？如果是，我该怎么办？

一种不需要大小写重复的解决方法吗？

How should I transform my data.frame in order to get what I desire?
Should I simply duplicate those observations belonging to i factor levels i times? If yes, how should I do that?
Is a workaround which does not require case duplications?

想法有人吗？

如何处理重叠的因子水平？（例如，制作表格和图表时） [英] How to deal with overlapping factor levels? (e.g. when producing tables and plots)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何处理重叠的因子水平？ （例如，制作表格和图表时） [英] How to deal with overlapping factor levels? (e.g. when producing tables and plots)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

如何处理重叠的因子水平？（例如，制作表格和图表时） [英] How to deal with overlapping factor levels? (e.g. when producing tables and plots)

登录关闭