substr在dplyr%>%mutate中 [英] substr in dplyr %>% mutate

查看:105
本文介绍了substr在dplyr%>%mutate中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

pcd <- data.frame(tripNo = c(618, 618, 610, 610, 610, 619), 
              procDate = as.Date(c('2016-03-02', '2016-03-03', '2016-03-02', '2016-03-03', '2016-03-02', '2016-03-03')),
              delay = c(7.45, 12.90, 11.88, 6.66, 12.50, 9.41) )

I要标记在两天不同日期处理的行程不一致,其中第二天的延迟比前一天的延迟更短。我现在这样做:

I want to flag inconsistencies in trips processed on two different days where the delay for the second day is shorter than the last one on the previous day. I have now done it this way:

pcd %>%
  arrange(tripNo, procDate, delay) %>% 
  group_by(tripNo) %>% 
  mutate(delayErr = (row_number() != 1) & (delay < lag(delay)),
         Alert = ifelse(delayErr, '!', '')) %>%
  select(tripNo, procDate, delay, delayErr, Alert)

  tripNo   procDate delay delayErr Alert
   (dbl)     (date) (dbl)    (lgl) (chr)
1    610 2016-03-02 11.88    FALSE      
2    610 2016-03-02 12.50    FALSE      
3    610 2016-03-03  6.66     TRUE     !
4    618 2016-03-02  7.45    FALSE      
5    618 2016-03-03 12.90    FALSE      
6    619 2016-03-03  9.41    FALSE      

所以这样可以,我的问题是关于我的第一次尝试,其中我尝试使用substr:

So this works OK, my question is about my first attempt, in which I tried to use substr:

pcd %>% arrange(tripNo, procDate, delay) %>% 
group_by(tripNo) %>% 
mutate(delayErr = (row_number() != 1) & (delay < lag(delay)),
       Alert = substr(' !', delayErr + 1, delayErr + 1) ) %>%  # <<< This is the only change
select(tripNo, procDate, delay, delayErr, Alert)

  tripNo   procDate delay delayErr Alert
   (dbl)     (date) (dbl)    (lgl) (chr)
1    610 2016-03-02 11.88    FALSE      
2    610 2016-03-02 12.50    FALSE      
3    610 2016-03-03  6.66     TRUE      
4    618 2016-03-02  7.45    FALSE      
5    618 2016-03-03 12.90    FALSE      
6    619 2016-03-03  9.41    FALSE      

使用此代码,警报不会按预期显示。
有人向我解释为什么第二个dplyr查询不起作用?

谢谢!

With this code, the Alert does not show as I expected. Could someone explain to me why the second dplyr query doesn't work?
Thanks!

推荐答案

已经有一个向量化版本的 substr ie substring

There is already a vectorized version of substr i.e. substring

pcd %>%
  arrange(tripNo, procDate, delay) %>% 
  group_by(tripNo) %>% 
  mutate(delayErr = (row_number() != 1) & (delay < lag(delay)),
         Alert = substring(' !', delayErr +1, delayErr +1)) %>% 
  select(tripNo, procDate, delay, delayErr, Alert)
#   tripNo   procDate delay delayErr Alert
#   (dbl)     (date) (dbl)    (lgl) (chr)
#1    610 2016-03-02 11.88    FALSE      
#2    610 2016-03-02 12.50    FALSE      
#3    610 2016-03-03  6.66     TRUE     !
#4    618 2016-03-02  7.45    FALSE      
#5    618 2016-03-03 12.90    FALSE      
#6    619 2016-03-03  9.41    FALSE      

这篇关于substr在dplyr%&gt;%mutate中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆