我如何按一个行的子集的平均值对数据框排序? [英] How do I sort a dataframe by the average of subsets of one of the rows?

查看:161
本文介绍了我如何按一个行的子集的平均值对数据框排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

R我很新,但我取得了很好的进展。除了一件事之外,我已经能够将ggplot2弯曲成我的意志:分类标签沿着我的boxplot中的x轴绘制的顺序。我认为这只是我知道如何解决公式中的数据框范围的一个漏洞,但这里是假数据,称为df数据框:

> 索引标签价值
索引1 A 1
索引2 A 2
索引3 A 3
索引4 B 12
索引5 B 11
索引6 B 10
index7 C 8
index8 C 7
index9 C 9
...
index76 Z 15
index77 Z 17
index78 Z 16

我的图表代码看起来像 qplot(df $ Label,df $ Value ,data = df)+ scale_x_discrete(Label)+ opts(axis.text.x = theme_text(angle = 90,hjust = 0,size = 7))+ geom_boxplot()正是我想要的,这是一个boxplot显示标签A的盒子和晶须,一个用于B,另一个用于C.但是,该轴按照标签的顺序(1,2,3的盒图最接近原产地,中间10,11,12,右边7,8,9图)。我想要的是boxplot数据从标签平均值最高的子集开始,然后按降序进行。我可以通过 mean(df $ Label [1:3]) mean(df $ Label [4:6])在每个标签内取平均值等,但我无法弄清楚如何让图形显示,以便标签的图不会按照它们出现在因子(df $ Label)中的顺序(也就是A,B,C沿x的方向,盒子在2,11,8),但是按照最高标签内平均值到最低值(即B,C,A沿着x和方框)在11,8,2)。



我想我会创建一个由每个标签内平均值组成的矢量,并以某种方式将其传递给ggplot以指定轴顺序,但我无法弄清楚



我需要知道的是:

什么是最好的方法得到由每个标签的平均值组成的矢量,从最高到最低?
如何将该向量传递给ggplot,以便按照这些值对x轴进行排序,同时仍使用因子(df $ Label)标记x轴


我也乐意提供其他显示数据的方法,但我认为我已经非常接近我想要的&平均值&在给定标签内传播值是很重要的。

解决方案

以下是一种方法:

 #创建一个虚拟数据框
set.seed(1234)
df = data.frame(
label = rep (字母[1:3],每个= 3),
值=样本(100,9))

#boxplot无需排序
qplot(label,value,data = df,geom ='boxplot')

 #boxplot标签按值的中位数排序
qplot(reorder(label,value,median),value,data = df,geom ='boxplot')


I'm fairly new to R, but I'm making good progress. I've been able to bend ggplot2 to my will with the exception of one thing: the order that the categorical labels are plotted along the x axis in my boxplot. I think this is just a hole in my knowledge of how to address ranges of a dataframe in formulas, but here's the fake data, as a dataframe called df:

Index    Label    Value
index1   A        1
index2   A        2
index3   A        3
index4   B        12
index5   B        11
index6   B        10
index7   C        8
index8   C        7
index9   C        9
...
index76  Z        15
index77  Z        17
index78  Z        16

My plot code looks like qplot(df$Label, df$Value, data=df) + scale_x_discrete("Label") + opts(axis.text.x = theme_text(angle = 90, hjust = 0, size=7)) + geom_boxplot() and gives me exactly what I want, which is a boxplot showing one box&whiskers for label A, one for B, and one for C. However, the axis goes in the order of the labels (the boxplot of 1,2,3 being closest to the origin, 10,11,12 in the middle, 7,8,9 on the right of the graph). What I want is for the boxplot data to start with the subset that has the highest within label average and proceed in decreasing order. I can average within each label by mean(df$Label[1:3]) and mean(df$Label[4:6]) etc, but I can't figure out how to get the graph to display such that the plots for the labels go not in the order they appear in factor(df$Label) (i.e. A, B, C along the x with boxes at 2, 11, 8) but in order of highest within-label average to lowest (i.e. B, C, A along the x and the boxes then at 11, 8, 2).

I'm thinking I would create a vector consisting of each within-label average and somehow pass that to ggplot to specify the axis order, but I can't figure out how to create the vector to start with.

What I need to know is:

What's the best way to get a vector consisting of the averages of each label, in order from highest to lowest? How do I pass that vector to ggplot so that it orders the x-axis by those values, while still labeling the x axis with factor(df$Label)

I'm open to suggestions for other ways to display the data as well, but I think I'm pretty close to what I want & the mean & spread of the values within a given label is important.

解决方案

Here is one way to do it

# create a dummy data frame
set.seed(1234)
df = data.frame(
       label = rep(letters[1:3], each = 3),
       value = sample(100, 9))

# boxplot without sorting
qplot(label, value, data = df, geom = 'boxplot')

# boxplot with label sorted by median of value
qplot(reorder(label, value, median), value, data = df, geom = 'boxplot')

这篇关于我如何按一个行的子集的平均值对数据框排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆