如何在R中将数据分割成几段? [英] How to cut data in even pieces in R?

查看:289
本文介绍了如何在R中将数据分割成几段?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题出在这里:我有一个数据集,例如:

Here is the problem: I have a dataset, let's say:

a <- c(0,0,0,0,1,1,1,1,1,1)

我想剪掉它切成均匀的碎片(例如5件)。问题是我无法使用分位数或剪切,因为某些值会重复,因此无法设置不同的断点。

I want to cut it into even pieces (e.g. 5 pieces ). The problem is I cannot use quantiles or cut because some values repeat, so you cannot set distinct breakpoints.

> quantile(a)
  0%  25%  50%  75% 100% 
   0    0    1    1    1 

(重复的断点)

> cut(a, 5)
 [1] (-0.001,0.199] (-0.001,0.199] (-0.001,0.199] (-0.001,0.199] (0.801,1]     
 [6] (0.801,1]      (0.801,1]      (0.801,1]      (0.801,1]      (0.801,1]     
Levels: (-0.001,0.199] (0.199,0.4] (0.4,0.6] (0.6,0.801] (0.801,1]

(仅使用两个级别)

我知道我可以产生这样的向量:

I know I can produce a vector like this:

b <- c(1,1,2,2,3,3,4,4,5,5)

并将其用于采样或者我可以使用for循环和计数实例,但这需要循环和一些笨拙的编码。我正在寻找一种简单高效的(R样式)函数,其功能要比这更好。

and use it for sampling. Or I can use for loop and count instances. But this needs loops and some clumsy coding. I am looking for a simple and efficient (R-style) function that does better than this.

(我可以编写它,但我不想重新发明轮子。)

(I can write it but I don't want to reinvent the wheel.)

推荐答案

您可以使用 cut ,但必须在向量的数字索引上使用它,即 seq(a),而不是向量本身。

You can use cut, but you have to use it on the numerical indices of the vector, i.e., seq(a), not the vector itself.

然后将向量分成等长的 split

Then you split the vector into pieces of equal length with split:

split(a, cut(seq(a), 5, labels = FALSE))

这将返回五个短向量的列表。

This returns a list of five short vectors.

另一种没有切割的方式是

split(a, rep(seq(5), each = length(a) / 5))

这篇关于如何在R中将数据分割成几段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆