按两个变量分组汇总 [英] Summarizing by group of two variables
问题描述
考虑一个简化的数据集(实际数据集具有更多的行和列):
Consider a simplified dataset (the real one has more columns and rows):
df
tp tf weight
1 FWD RF 78.86166
2 MF LF 81.04566
3 DEF LF 80.70527
4 DEF LF 82.96071
5 DEF RF 78.42544
6 GK LF 79.37686
7 DEF RF 78.79928
8 MF RF NA
9 MF RF 78.93815
10 DEF RF 80.00284
我想通过分组的tp和tf的中位数来填充重量的缺失值
I want to fill the missing values in weight by the grouped median of tp and tf combined
什么我直到现在都尝试过以下操作(我使用过dlpyr)
What i have tried till now is the following (I have used dlpyr)
temp <- df %>% group_by(tp,tf) %>% summarise(mvalue = median(weight,na. rm = TRUE))
这使temp为:
temp
Source: local data frame [6 x 3]
Groups: tp [?]
tp tf mvalue
<fctr> <fctr> <dbl>
1 DEF LF 81.83299
2 DEF RF 78.79928
3 FWD RF 78.86166
4 GK LF 79.37686
5 MF LF 81.04566
6 MF RF 78.93815
现在,我无法弄清楚如何用相应的组中位数来填充df中的缺失值。
Now i am unable to figure out how to fill the missing values in df with the corresponding group median.
在我的简单情况下,只有一个NA对应于tp = MF和tf = RF,如果您看一下温度,
的中位数为78.93815
In my simple case there is only one NA corresponding to tp = MF and tf = RF, the median value if you look up at temp is 78.93815
一般而言,我该怎么做?不要建议您是否有比我最初的方法更好的方法。
How do i do this in general? Do suggest if you have a better approach than my initial one.
编辑:
如果
推荐答案
您可以尝试,
library(dplyr)
df %>%
group_by(tp, tf) %>%
mutate(weight = replace(weight, is.na(weight), median(weight, na.rm = TRUE)))
这篇关于按两个变量分组汇总的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!