根据范围在R中创建分类变量 [英] Create categorical variable in R based on range

查看:791
本文介绍了根据范围在R中创建分类变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有整数列的数据框,我想将其用作创建新的类别变量的参考.我想将变量分为三组并自行设置范围(即0-5、6-10等).我尝试了cut,但是它根据正态分布将变量分为几组,并且我的数据右偏.我也尝试过使用if/then语句,但这会输出一个true/false值,我想保留我的原始变量.我相信有一个简单的方法可以做到这一点,但我似乎无法弄清楚.有什么建议可以快速完成此操作吗?

我的想法是这样的:

x   x.range
3   0-5
4   0-5
6   6-10
12  11-15

据我所知,解决方案

Ian的答案( cut )是最常见的解决方法.

我更喜欢使用 Lattice 软件包中的 shingle

指定装仓间隔的参数对我来说似乎更直观.

您可以像这样使用 shingle :

# mock some data
data = sample(0:40, 200, replace=T)

a = c(0, 5);b = c(5,9);c = c(9, 19);d = c(19, 33);e = c(33, 41)

my_bins = matrix(rbind(a, b, c, d, e), ncol=2)

# returns: (the binning intervals i've set)
        [,1] [,2]
 [1,]    0    5
 [2,]    5    9
 [3,]    9   19
 [4,]   19   33
 [5,]   33   41

shx = shingle(data, intervals=my_bins)

#'shx' at the interactive prompt will give you a nice frequency table:
# Intervals:
   min max count
1   0   5    23
2   5   9    17
3   9  19    56
4  19  33    76
5  33  41    46

I have a dataframe with a column of integers that I would like to use as a reference to make a new categorical variable. I want to divide the variable into three groups and set the ranges myself (ie 0-5, 6-10, etc). I tried cut but that divides the variable into groups based on a normal distribution and my data is right skewed. I have also tried to use if/then statements but this outputs a true/false value and I would like to keep my original variable. I am sure that there is a simple way to do this but I cannot seem to figure it out. Any advice on a simple way to do this quickly?

I had something in mind like this:

x   x.range
3   0-5
4   0-5
6   6-10
12  11-15

解决方案

Ian's answer (cut) is the most common way to do this, as far as i know.

I prefer to use shingle, from the Lattice Package

the argument that specifies the binning intervals seems a little more intuitive to me.

you use shingle like so:

# mock some data
data = sample(0:40, 200, replace=T)

a = c(0, 5);b = c(5,9);c = c(9, 19);d = c(19, 33);e = c(33, 41)

my_bins = matrix(rbind(a, b, c, d, e), ncol=2)

# returns: (the binning intervals i've set)
        [,1] [,2]
 [1,]    0    5
 [2,]    5    9
 [3,]    9   19
 [4,]   19   33
 [5,]   33   41

shx = shingle(data, intervals=my_bins)

#'shx' at the interactive prompt will give you a nice frequency table:
# Intervals:
   min max count
1   0   5    23
2   5   9    17
3   9  19    56
4  19  33    76
5  33  41    46

这篇关于根据范围在R中创建分类变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆