R dplyr:最小值最大值功能在mutate中不起作用 [英] R dplyr: min max function not working in mutate
问题描述
我对dplyr有问题,无法解决.另外,我还没有一个完整的可行示例,因为该问题仅出现在整套数据(我无法与您共享)上.
I have an issue with dplyr I cannot resolve. Also I do not have a full workable example, since the problem only occurs with the full set of data (that I cannot share with you).
我执行以下操作:
t %>% group_by(id, add=TRUE) %>%
summarise(minbplevel = min(ref, na.rm=T)
,maxbplevel = max(ref, na.rm=T)
) %>% filter(id %in% c(caseA,caseB))
会导致
id minbplevel maxbplevel
(dbl) (dbl) (dbl)
1 B 33.0 73.0
2 A 39.4 80.4
但是当我这样做
t %>% group_by(id, add=TRUE) %>%
mutate(minbplevel = min(ref, na.rm=T)
,maxbplevel = max(ref, na.rm=T)
) %>% filter(id %in% c(caseA,caseB))
结果为:
id Level refparmax refparmin ref meanbptest minbplevel maxbplevel
(dbl) (chr) (int) (int) (dbl) (dbl) (dbl) (dbl)
1 B 0SD 69 68 49.0 52.00000 33 73
2 B min1SD 69 68 41.0 52.00000 33 73
3 B min2SD 69 68 33.0 52.00000 33 73
4 B plus1SD 69 68 59.0 52.00000 33 73
5 B plus2SD 69 68 73.0 52.00000 33 73
6 A 0SD 100 95 56.4 35.33333 NA NA
7 A min1SD 100 95 47.4 35.33333 NA NA
8 A min2SD 100 95 39.4 35.33333 NA NA
9 A plus1SD 100 95 67.4 35.33333 NA NA
10 A plus2SD 100 95 80.4 35.33333 NA NA
为什么要生产A的NA,我一无所知.似乎每次我对数据的子集进行尝试时,出现数据的第二种情况就是问题所在,但这只是预感.只是18850的一种情况导致了此问题,但是没有可识别的因素使该问题的情况与其他情况有所不同.
Why the NA's in case A are produced, I have no clue. It seems that each time I try it on a subset of the data, the second case with data is the problem, but that is just a hunch. It is only one case of the 18850 that gives this issue, but there is nothing identifiable that makes the problem case different than the rest.
请提出我可以尝试解决的建议?我可以考虑解决方法,创建汇总数据,然后将结果与原始数据合并.但是我认为dplyr可以让我一步一步做到这一点.
Please advice what I can try to do to solve this? I can think of workarounds, creating the summarized data and then merging the result with the original data. But I thought that dplyr would allow me to do this in one step.
我尝试删除或添加add = TRUE选项.没什么区别.
I tried removing or adding the add = TRUE option. That does not make any difference.
也许我用错了方式.
根据我尝试过的评论:
subset(with(t,aggregate(ref~id, t, FUN= min, na.rm=TRUE, na.action= na.pass)),id %in% c(caseA,caseB))
会导致
id ref
4 B 33.0
5 A 39.4
我必须掩盖数据的某些部分.
I have to mask some parts of the data.
dput(head(subset(t,id %in% c(caseA,caseB)) , 12))
给予:
同样,我用变量caseB和caseA替换了实际的ID.而且这不是发生问题的完整数据集.
Again I replaced the actual id's with variables caseB and caseA. Also this is not the full dataset in which the problem occurs.
structure(list(id = c(caseB, caseB, caseB, caseB, caseB,
caseA, caseA, caseA, caseA, caseA), Level = c("0SD", "min1SD",
"min2SD", "plus1SD", "plus2SD", "0SD", "min1SD", "min2SD", "plus1SD",
"plus2SD"), refparmax = c(69L, 69L, 69L, 69L, 69L, 100L, 100L,
100L, 100L, 100L), refparmin = c(68L, 68L, 68L, 68L, 68L, 95L,
95L, 95L, 95L, 95L), ref = c(49, 41, 33, 59, 73, 56.4, 47.4,
39.4, 67.4, 80.4), meanbptest = c(52, 52, 52, 52, 52, 35.3333333333333,
35.3333333333333, 35.3333333333333, 35.3333333333333, 35.3333333333333
)), .Names = c("id", "Level", "refparmax", "refparmin", "ref",
"meanbptest"), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L), vars = list(id), drop = TRUE, indices = list(
0:4, 5:9), group_sizes = c(5L, 5L), biggest_group_size = 5L, labels = structure(list(
id = c(caseB, caseA)), class = "data.frame", row.names = c(NA,
-2L), vars = list(id), drop = TRUE, .Names = "id"))
推荐答案
如果我将ref列中的所有NA都替换为零,则mutate步骤工作正常.正如aosmith所建议的,这可能与dplyr的开发版本中修复的mutate和NA问题有关.
If I replace all NA's in the ref column with zeros the mutate step is working fine. As aosmith suggested, it has probably something to do with the mutate and NA issue that is fixed in the developement version of dplyr.
由于工作站的限制,我无法测试此建议.因此,我将通过NA替换步骤解决此问题,并在摘要步骤之后处理零值.
I cannot test this suggestion due to workstation restrictions though. So I will work around the issue, with the NA replacement step, and process the zero values after the summary steps.
这篇关于R dplyr:最小值最大值功能在mutate中不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!