如何完成用R中的中值替换NA的代码 [英] How to finish code to replace NA with median in R

查看:164
本文介绍了如何完成用R中的中值替换NA的代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对R很陌生,所以请保持温柔.

I am very new to R, so please please be gentle.

我正在研究Kaggle泰坦尼克号比赛,以使我进入R并解决问题.

I am working on the Kaggle Titanic competition, to get me into R and working things out.

我正在通过功能设计工作,我对下一步的逻辑有些困惑.

I am working my way through engineering a feature and I am a bit stuck with the logic of what to do next.

那么,就到这里.我的目标是获取年龄"数据,并将所有NA替换为该人头衔的年龄中位数.例如如果此人是一位大师,我想获取所有大师的中位数,并用该中位数代替NA.先生等也是这样.

So, here goes. My goal is to take the Age data and replace all of the NA with the median of age for the title of the person. e.g. if the person is a master, I want to get the median of all the masters and replace the NA with that median. Same for Mr. and so on.

我设法创建了一个包含标题和年龄的data.frame,如下所示:

I have managed to create myself a data.frame containing title and age as follows:

library(tibble)
data.combined <-
  tibble(
    data.combined.new.title = c(
      "Mr.",
      "Mrs.",
      "Miss",
      "Mrs.",
      "Mr.",
      "Mr.",
      "Mr.",
      "Master",
      "Mrs."
    ),
    data.combined.Age = c(22, 38, 26, 35, 35, NA, 54, 2, 27)
  )

如您在此列表中看到的,他的年龄旁边有一个带和的先生.我想用列表中所有其他先生的中位数替换该NA.

As you can see in this list there is a Mr. with and NA next to his age. I want to replace that NA with the Median of all the other Mr in the list.

所以我有以下代码,直到可以用整个数据集的中位数替换NA为止.

so I have the following code up to the point where I can replace the NA's with the median of the whole data set.

#Creates my data.frame
agedata <- data.frame(data.combined$new.title, data.combined$Age)

#replace NA with the mean of the whole data set
agedata$data.combined.Age[is.na(agedata$data.combined.Age)] <- median(agedata$data.combined.Age, na.rm = TRUE)

我不明白的是,我将如何添加到此代码中,以标题,先生,主人,夫人,小姐等组的中位数替换NA?

What I just don't get is how would I add to this code to replace the NA by the median of the groups of title, Mr, Master, Mrs, Miss?

任何指针都可以收到.

我不太感兴趣这是否对我目前对Kaggle的预测有所帮助,更多地是关于代码的外观.

I'm not too interested in whether this is going to help with my prediction for Kaggle at this point, more with how the code should look.

非常感谢.

推荐答案

或者这个tidyverse单层纸

agedata %>% group_by(title) %>% mutate(age=ifelse(is.na(age), median(age, na.rm=TRUE), age))

这篇关于如何完成用R中的中值替换NA的代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆