将数字变量和分类变量归类到R中的适当范围 [英] Categorising numerical and categorical variables into appropriate ranges in R

查看:192
本文介绍了将数字变量和分类变量归类到R中的适当范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 Df <- bball5
 str(bball5)
 'data.frame':  379 obs. of 9 variables:
 $ ID         : int  238 239 240 241 242 243 244 245 246 247 ...
 $ Sex        : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 1 1 ...
 $ Sport      : Factor w/ 10 levels "BBall","Field",..: 1 1 1 1 1 1 1 1 1 1 
 $ Ht         : num  196 190 178 185 185 ...
 $ Wt         : num  78.9 74.4 69.1 74.9 64.6 63.7 75.2 62.3 66.5 62.9 ...
 $ BMI        : num  20.6 20.7 21.9 21.9 19 ...
 $ BMIc       : NA NA NA NA NA NA NA NA NA NA ...
 $ Sex_f      : Factor w/ 1 level "female": 1 1 1 1 1 1 1 1 1 1 ...
 $ Sex_m      : Factor w/ 1 level "male": NA NA NA NA NA NA NA NA NA NA ...

我想在1000个大型数据集中对一组数字变量进行分类。

I would like to class a set of numerical variables within a large dataset of a 1000.

我需要将BMI分为以下范围:

I need to classify BMI into the following ranges:

    (<18.50, 18.50-24.99, 24.99-25.00, >=30.00) 

并将它们分别标记为:

  "Underweight" "Normal" "Overweight" "Obese" 

以便绘制表格证明以下两种关系是分开的:

$男性
$女性

根据运动类型。

So as to plot tables to demonstrate relationships that are the separate for:
$ males $ females
according to sport types.

我还需要确认计算出的BMI是否正确完成,因为我发现很难在数据集中为新变量列创建公式

I also need to confirm that the BMI calculated is correctly done, as I am finding it difficult to create formula within the dataset for a new variable column

$ BMIc.

变量(NA)中有多个缺失值,每个变量中都存在错误我创建了一个函数来计算新变量

There are several missing values in variables (NA),within each variable, which are giving me errors if I create a function to calculate the a new variable

 bball5$BMIc <- bball5$BMI[bball5$BMI, c(bball5$wt/(bball5$Ht)^2 ]

我无法对BMI变量进行分类。

I am unable to class the BMI variables. I must maintain the ID to match as well.

推荐答案

您可以创建一个名为 BMIclass ,然后在其中创建4个类别:

You can create a variable named BMIclass and do this to create the 4 categories in it:

bball5$BMIclass <- "Underweight"
bball5[which(bball5$BMI>18.5 & ball5$BMI<24.99), 'BMIclass'] <- "Normal"
bball5[which(bball5$BMI>=24.99 & ball5$BMI<25), 'BMIclass'] <- "Overweight"
bball5[which(bball5$BMI>=30), 'BMIclass'] <- "Obese"
bball5$BMIclass <- as.factor(bball5$BMIc)

对于 BMIc ,您可以执行此操作(如下)。仍会在缺少值的地方创建一些NA,但会在有数据的地方为您提供正确的BMIc。

As for BMIc you can do this (below). It will still create some NAs where there are missing values but it will give you the correct BMIc where there is data for it.

bball5$BMIc <- bball5$wt/bball5$Ht^2

这篇关于将数字变量和分类变量归类到R中的适当范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆