将一定范围的值划分为等长的区间:cut vs cut2 [英] divide a range of values in bins of equal length: cut vs cut2

查看:85
本文介绍了将一定范围的值划分为等长的区间:cut vs cut2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用cut函数将数据分成相等的bin,它可以完成工作,但是我对返回值的方式不满意.我需要的是垃圾箱的中心,而不是上下两端.
我还尝试使用 cut2{Hmisc} ,这使我每个人的中心箱,但是它将数据范围划分为包含相同数量的观察值而不是相同长度的箱.

I'm using the cut function to split my data in equal bins, it does the job but I'm not happy with the way it returns the values. What I need is the center of the bin not the upper and lower ends.
I've also tried to use cut2{Hmisc}, this gives me the center of each bins, but it divides the range of data in bins that contains the same numbers of observations, rather than being of the same length.

有人对此有解决方案吗?

Does anyone have a solution to this?

推荐答案

使用类似的方法让自己休息一下并给自己加标签并不难.由于中点是单个数字,因此我实际上没有返回带有标签的因数,而是返回了数字矢量.

It's not too hard to make the breaks and labels yourself, with something like this. Here since the midpoint is a single number, I don't actually return a factor with labels but instead a numeric vector.

cut2 <- function(x, breaks) {
  r <- range(x)
  b <- seq(r[1], r[2], length=2*breaks+1)
  brk <- b[0:breaks*2+1]
  mid <- b[1:breaks*2]
  brk[1] <- brk[1]-0.01
  k <- cut(x, breaks=brk, labels=FALSE)
  mid[k]
}

可能有更好的方法来获取垃圾箱中断和中点;我没想那么难.

There's probably a better way to get the bin breaks and midpoints; I didn't think about it very hard.

请注意,此答案与约书亚的答案有所不同;他给出了每个分类中数据的中位数,而给出了每个分类中的数据中心.

Note that this answer is different than Joshua's; his gives the median of the data in each bins while this gives the center of each bin.

> head(cut2(x,3))
[1] 16.666667  3.333333 16.666667  3.333333 16.666667 16.666667
> head(ave(x, cut(x,3), FUN=median))
[1] 18  2 18  2 18 18

这篇关于将一定范围的值划分为等长的区间:cut vs cut2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆