R dplyr:最小值最大值功能在mutate中不起作用 [英] R dplyr: min max function not working in mutate

查看:71
本文介绍了R dplyr:最小值最大值功能在mutate中不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对dplyr有问题,无法解决.另外,我还没有一个完整的可行示例,因为该问题仅出现在整套数据(我无法与您共享)上.

I have an issue with dplyr I cannot resolve. Also I do not have a full workable example, since the problem only occurs with the full set of data (that I cannot share with you).

我执行以下操作:

t %>% group_by(id, add=TRUE) %>% 
  summarise(minbplevel = min(ref, na.rm=T)
       ,maxbplevel = max(ref, na.rm=T)
       ) %>% filter(id %in% c(caseA,caseB))

会导致

id         minbplevel maxbplevel
(dbl)      (dbl)      (dbl)
1 B            33.0       73.0
2 A            39.4       80.4

但是当我这样做

t %>% group_by(id, add=TRUE) %>% 
mutate(minbplevel = min(ref, na.rm=T)
       ,maxbplevel = max(ref, na.rm=T)
       ) %>% filter(id %in% c(caseA,caseB))

结果为:

   id   Level refparmax refparmin   ref meanbptest minbplevel maxbplevel
(dbl)   (chr)     (int)     (int) (dbl)      (dbl)      (dbl)      (dbl)
1  B          0SD        69        68  49.0   52.00000         33         73
2  B       min1SD        69        68  41.0   52.00000         33         73
3  B       min2SD        69        68  33.0   52.00000         33         73
4  B      plus1SD        69        68  59.0   52.00000         33         73
5  B      plus2SD        69        68  73.0   52.00000         33         73
6  A          0SD       100        95  56.4   35.33333         NA         NA
7  A       min1SD       100        95  47.4   35.33333         NA         NA
8  A       min2SD       100        95  39.4   35.33333         NA         NA
9  A      plus1SD       100        95  67.4   35.33333         NA         NA
10 A      plus2SD       100        95  80.4   35.33333         NA         NA

为什么要生产A的NA,我一无所知.似乎每次我对数据的子集进行尝试时,出现数据的第二种情况就是问题所在,但这只是预感.只是18850的一种情况导致了此问题,但是没有可识别的因素使该问题的情况与其他情况有所不同.

Why the NA's in case A are produced, I have no clue. It seems that each time I try it on a subset of the data, the second case with data is the problem, but that is just a hunch. It is only one case of the 18850 that gives this issue, but there is nothing identifiable that makes the problem case different than the rest.

请提出我可以尝试解决的建议?我可以考虑解决方法,创建汇总数据,然后将结果与原始数据合并.但是我认为dplyr可以让我一步一步做到这一点.

Please advice what I can try to do to solve this? I can think of workarounds, creating the summarized data and then merging the result with the original data. But I thought that dplyr would allow me to do this in one step.

我尝试删除或添加add = TRUE选项.没什么区别.

I tried removing or adding the add = TRUE option. That does not make any difference.

也许我用错了方式.

根据我尝试过的评论:

subset(with(t,aggregate(ref~id, t, FUN= min, na.rm=TRUE, na.action= na.pass)),id %in% c(caseA,caseB))

会导致

      id  ref
4 B 33.0
5 A 39.4


我必须掩盖数据的某些部分.


I have to mask some parts of the data.

dput(head(subset(t,id %in% c(caseA,caseB)) , 12))

给予:

同样,我用变量caseB和caseA替换了实际的ID.而且这不是发生问题的完整数据集.

Again I replaced the actual id's with variables caseB and caseA. Also this is not the full dataset in which the problem occurs.

structure(list(id = c(caseB, caseB, caseB, caseB, caseB, 
caseA, caseA, caseA, caseA, caseA), Level = c("0SD", "min1SD", 
"min2SD", "plus1SD", "plus2SD", "0SD", "min1SD", "min2SD", "plus1SD", 
"plus2SD"), refparmax = c(69L, 69L, 69L, 69L, 69L, 100L, 100L, 
100L, 100L, 100L), refparmin = c(68L, 68L, 68L, 68L, 68L, 95L, 
95L, 95L, 95L, 95L), ref = c(49, 41, 33, 59, 73, 56.4, 47.4, 
39.4, 67.4, 80.4), meanbptest = c(52, 52, 52, 52, 52, 35.3333333333333, 
35.3333333333333, 35.3333333333333, 35.3333333333333, 35.3333333333333
)), .Names = c("id", "Level", "refparmax", "refparmin", "ref", 
"meanbptest"), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L), vars = list(id), drop = TRUE, indices = list(
    0:4, 5:9), group_sizes = c(5L, 5L), biggest_group_size = 5L, labels = structure(list(
    id = c(caseB, caseA)), class = "data.frame", row.names = c(NA, 
-2L), vars = list(id), drop = TRUE, .Names = "id"))

推荐答案

如果我将ref列中的所有NA都替换为零,则mutate步骤工作正常.正如aosmith所建议的,这可能与dplyr的开发版本中修复的mutate和NA问题有关.

If I replace all NA's in the ref column with zeros the mutate step is working fine. As aosmith suggested, it has probably something to do with the mutate and NA issue that is fixed in the developement version of dplyr.

由于工作站的限制,我无法测试此建议.因此,我将通过NA替换步骤解决此问题,并在摘要步骤之后处理零值.

I cannot test this suggestion due to workstation restrictions though. So I will work around the issue, with the NA replacement step, and process the zero values after the summary steps.

这篇关于R dplyr:最小值最大值功能在mutate中不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆