如何在R中的因子水平内进行中位数拆分? [英] How to do median splits within factor levels in R?

查看:130
本文介绍了如何在R中的因子水平内进行中位数拆分?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在此处新建一列以指示myData是高于还是低于其中位数

Here I make a new column to indicate whether myData is above or below its median

### MedianSplits based on Whole Data
#create some test data
myDataFrame=data.frame(myData=runif(15),myFactor=rep(c("A","B","C"),5)) 

#create column showing median split
myBreaks= quantile(myDataFrame$myData,c(0,.5,1))
myDataFrame$MedianSplitWholeData = cut(
    myDataFrame$myData,
    breaks=myBreaks, 
    include.lowest=TRUE,
    labels=c("Below","Above"))

#Check if it's correct
myDataFrame$AboveWholeMedian = myDataFrame$myData > median(myDataFrame$myData)
myDataFrame

工作正常.现在,我想做同样的事情,但是要计算myFactor每个级别中的中位数拆分.

Works fine. Now I want to do the same thing, but compute the median splits within each level of myFactor.

我想出了这个:

#Median splits within factor levels
byOutput=by(myDataFrame$myData,myDataFrame$myFactor, function (x) {
     myBreaks= quantile(x,c(0,.5,1))
     MedianSplitByGroup=cut(x,
       breaks=myBreaks, 
       include.lowest=TRUE,
       labels=c("Below","Above"))
     MedianSplitByGroup
     })

byOutput包含我想要的.它正确分类了因子A,B和C的每个元素.但是,我想创建一个新列myDataFrame $ FactorLevelMedianSplit,该列显示新计算的中位数拆分.

byOutput contains what I want. It categorizes each element of factors A, B, and C correctly. However I'd like to create a new column, myDataFrame$FactorLevelMedianSplit, that shows the newly-computed median split.

如何将"by"命令的输出转换为有用的数据框列?

How do you convert the output of the "by" command into a useful data-frame column?

我认为也许"by"命令不是类似于R的方式来实现此目的...

I think perhaps the "by" command is not R-like way to do this ...

更新:

在Thierry的示例中,如何巧妙地使用factor(),并在Spector的书中发现了"ave"功能后,我找到了这种解决方案,不需要任何额外的程序包.

With Thierry's example of how to use factor() cleverly, and upon discovering the "ave" function in Spector's book, I've found this solution, which requires no additional packages.

myDataFrame$MediansByFactor=ave(
    myDataFrame$myData,
    myDataFrame$myFactor,
    FUN=median)

myDataFrame$FactorLevelMedianSplit = factor(
    myDataFrame$myData>myDataFrame$MediansByFactor, 
    levels = c(TRUE, FALSE), 
    labels = c("Above", "Below"))

推荐答案

以下是使用plyr软件包的解决方案.

Here is a solution using the plyr package.

myDataFrame <- data.frame(myData=runif(15),myFactor=rep(c("A","B","C"),5))
library(plyr)
ddply(myDataFrame, "myFactor", function(x){
    x$Median <- median(x$myData)
    x$FactorLevelMedianSplit <- factor(x$myData <= x$Median, levels = c(TRUE, FALSE), labels = c("Below", "Above"))
    x
})

这篇关于如何在R中的因子水平内进行中位数拆分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆