如何在 R 中对 4 个相同大小的组中的连续变量进行分类? [英] How to categorize a continuous variable in 4 groups of the same size in R?

查看:41
本文介绍了如何在 R 中对 4 个相同大小的组中的连续变量进行分类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将一个连续变量分为 4 个类别,每个类别都具有相同的观察次数.我用过这个功能

I need to categorize a continuous variable in 4 classes each one with the same number of observations. I have used the function

cut(x,breaks = quantile(x,probs=seq(0,1,0.25)),include.lowest=TRUE,right=FALSE))

我的问题是每个类别中的观察数量并不完全相同,因为有些观察(并且不止一个)具有完全相同的分位数值.我该怎么做?

My problem is that the number of observations in each category is not exactly the same because there are observations (and more than one) which have exactly the same value of the quantiles. How can I do it?

我的变量正在等待

[1] 79 54 74 62 85 55 88 85 51 85 54 84 78 47 83 52 62 84 52 79 51 47 78 69 74
[26] 83 55 76 78 79 73 77 66 80 74 52 48 80 59 90 80 58 84 58 73 83 64 53 82 59
[51] 75 90 54 80 54 83 71 64 77 81 59 84 48 82 60 92 78 78 65 73 82 56 79 71 62
[76] 76 60 78 76 83 75 82 70 65 73 88 76 80 48 86 60 90 50 78 63 72 84 75 51 82
[101] 62 88 49 83 81 47 84 52 86 81 75 59 89 79 59 81 50 85 59 87 53 69 77 56 88
[126] 81 45 82 55 90 45 83 56 89 46 82 51 86 53 79 81 60 82 77 76 59 80 49 96 53
[151] 77 77 65 81 71 70 81 93 53 89 45 86 58 78 66 76 63 88 52 93 49 57 77 68 81
[176] 81 73 50 85 74 55 77 83 83 51 78 84 46 83 55 81 57 76 84 77 81 87 77 51 78
[201] 60 82 91 53 78 46 77 84 49 83 71 80 49 75 64 76 53 94 55 76 50 82 54 75 78
[226] 79 78 78 70 79 70 54 86 50 90 54 54 77 79 64 75 47 86 63 85 82 57 82 67 74
[251] 54 83 73 73 88 80 71 83 56 79 78 84 58 83 43 60 75 81 46 90 46 74

它在 R 中忠实的数据集中.它有 272 个观测值,因此它可以被 4 整除,每个类别中有 68 个观测值.

which is in the dataset faithful in R. It has 272 observations, therefore it is divisible by 4 giving 68 observations in each category.

我用过

newwait<-cut(waiting, breaks =quantile(waiting,probs=seq(0,1,0.25)),include.lowest=TRUE,right=FALSE)

table(newwait)
newwait
[43,58) [58,76) [76,82) [82,96] 
     66      68      67      71 

如您所见,每组中的观察数相似但不完全相同.

as you can see, the number of observations in each group is similar but not exactly the same.

推荐答案

基本上,听起来您需要处理关系.您还需要一个向量,其长度除以 4 时会产生一个整数……但我假设您知道这一点.

Basically, it sounds like you need to deal with ties. You also need to have a vector whose length, when divided by 4, yields an integer...but I'll assume you know that.

这是使用 rank 的决胜局函数的解决方案:

Here's a solution using the tie-breaking functions of rank:

set.seed(1)
x <- round(runif(1000,0,1),1)
table(x)
## x
##   0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9   1 
##  43 106  95 103 112 109  82 102  95 100  53

y <- rank(x, ties.method='first') # <- this forces tie breaks
cuts <- cut(y, breaks = quantile(y,probs=seq(0,1,0.25)),
               include.lowest=TRUE,
               right=FALSE)
# check that cuts are all the same length:
lapply(split(x,cuts), length)
$`[1,251)`
[1] 250

$`[251,500)`
[1] 250

$`[500,750)`
[1] 250

$`[750,1e+03]`
[1] 250

这篇关于如何在 R 中对 4 个相同大小的组中的连续变量进行分类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆