将 NA 替换为基于 ID 属性的模式 [英] Replace NA with mode based on ID attribute

查看:17
本文介绍了将 NA 替换为基于 ID 属性的模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集 dt,我想用基于 id 的每个属性的 mode 替换 NA 值,如下所示:

I have a dataset dt and I want to replace the NA values with the mode of each attribute based on the id as follow:

之前:

 id  att  
  1  v
  1  v
  1  NA
  1  c
  2  c
  2  v
  2  NA
  2  c

我正在寻找的结果是:

 id  att
  1  v
  1  v
  1  v
  1  c
  2  c
  2  v
  2  c
  2  c

我做了一些尝试,例如我发现了另一个类似的问题,它想用 mean (具有内置函数)替换 NA,因此我尝试将代码调整如下:

I have done some attempts for example I found another similar question which wanted to replace the NA with mean (which has a built in function), therefore I tried to adjust the code as follow:

for (i in 1:dim(dt)[1]) {
    if (is.na(dt$att[i])) {
      att_mode <-                  # I am stuck here to return the mode of an attribute based on ID
      dt$att[i] <- att_mode 
    }
  }

我找到了以下函数来计算模式

I found the following function to calculate the mode

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

来自以下链接:是否有内置-in 寻找模式的函数?

但我不知道如何在 for 循环中应用它,我尝试了 apply、ave 函数,但由于维度不同,它们似乎不是正确的选择.

But I have no idea how to apply it inside the for loop, I tried apply, ave functions but they do not seem to be the right choice because of the different dimensions.

谁能帮助我如何在我的 for 循环中返回模式?

Could anyone help on how to return the mode in my for loop?

谢谢

推荐答案

我们可以使用 library(zoo) 中的 na.aggrgate,指定 FUN 作为 模式.如果这是按操作分组,我们可以使用 data.table 来完成.将 'data.frame' 转换为 'data.table' (setDT(df1)),按 'id' 分组,我们应用 na.aggregate

We can use na.aggrgate from library(zoo), specify the FUN as Mode. If this is a group by operation, we can do this using data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'id', we apply the na.aggregate

library(data.table)
library(zoo)
setDT(df1)[, att:= na.aggregate(att, FUN=Mode), by = id]
df1
#    id att
#1:  1   v
#2:  1   v
#3:  1   v
#4:  1   c
#5:  2   c
#6:  2   v
#7:  2   c
#8:  2   c

<小时>

dplyr

library(dplyr)
df1 %>%
     group_by(id) %>%
     mutate(att = na.aggregate(att, FUN=Mode))

注意:Mode 来自 OP 的帖子.另外,假设 'att' 是 character 类.

NOTE: Mode from OP's post. Also, assuming that the 'att' is character class.

这篇关于将 NA 替换为基于 ID 属性的模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆