使用基于ID属性的模式替换NA [英] Replace NA with mode based on ID attribute
问题描述
我有一个数据集 dt
,我想用模式替换 NA
值
I have a dataset dt
and I want to replace the NA
values with the mode of each attribute based on the id as follow:
之前:
id att
1 v
1 v
1 NA
1 c
2 c
2 v
2 NA
2 c
我要寻找的结果是:
id att
1 v
1 v
1 v
1 c
2 c
2 v
2 c
2 c
例如,我做了一些尝试我发现了另一个类似的问题,想用 mean (具有内置功能)替换NA,因此我尝试如下调整代码:
I have done some attempts for example I found another similar question which wanted to replace the NA with mean (which has a built in function), therefore I tried to adjust the code as follow:
for (i in 1:dim(dt)[1]) {
if (is.na(dt$att[i])) {
att_mode <- # I am stuck here to return the mode of an attribute based on ID
dt$att[i] <- att_mode
}
}
我发现以下函数可以计算模式
I found the following function to calculate the mode
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
链接:是否有内置函数可用于查找
但是我不知道如何在for循环中应用它,我尝试了apply,ave函数,但是它们似乎不正确选择是因为尺寸不同。
But I have no idea how to apply it inside the for loop, I tried apply, ave functions but they do not seem to be the right choice because of the different dimensions.
有人可以帮忙在我的for循环中返回该模式吗?
Could anyone help on how to return the mode in my for loop?
谢谢
推荐答案
我们可以使用库中的
,将 na.aggrgate
(动物园) FUN
指定为 Mode
。如果是按操作分组,则可以使用 data.table
进行操作。将'data.frame'转换为'data.table'( setDT(df1)
),按'id'分组,我们应用 na。总计
We can use na.aggrgate
from library(zoo)
, specify the FUN
as Mode
. If this is a group by operation, we can do this using data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'id', we apply the na.aggregate
library(data.table)
library(zoo)
setDT(df1)[, att:= na.aggregate(att, FUN=Mode), by = id]
df1
# id att
#1: 1 v
#2: 1 v
#3: 1 v
#4: 1 c
#5: 2 c
#6: 2 v
#7: 2 c
#8: 2 c
A类似 dplyr
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(att = na.aggregate(att, FUN=Mode))
注意:OP帖子中的 Mode
。另外,假设 att是个字符
类。
NOTE: Mode
from OP's post. Also, assuming that the 'att' is character
class.
这篇关于使用基于ID属性的模式替换NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!