计算不同时间轴中的发生次数 [英] count number of occurences in different timeline

查看:71
本文介绍了计算不同时间轴中的发生次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这种数据.

library(dplyr)
library(tidyverse)

df <- tibble(mydate = as.Date(c("2019-05-11 23:01:00", "2019-05-11 23:02:00", "2019-05-11 23:03:00", "2019-05-11 23:04:00",
                                "2019-05-12 23:05:00", "2019-05-12 23:06:00", "2019-05-12 23:07:00", "2019-05-12 23:08:00",
                                "2019-05-13 23:09:00", "2019-05-13 23:10:00", "2019-05-13 23:11:00", "2019-05-13 23:12:00",
                                "2019-05-14 23:13:00", "2019-05-14 23:14:00", "2019-05-14 23:15:00", "2019-05-14 23:16:00",
                                "2019-05-15 23:17:00", "2019-05-15 23:18:00", "2019-05-15 23:19:00", "2019-05-15 23:20:00")),
               myval = c(0, NA, 1500, 1500,
                         1500, 1500, NA, 0,
                         0, 0, 1100, 1100,
                         1100, 0, 200, 200,
                         1100, 1100, 1100, 0
               ))

我想将每个相同的值除以它显示的计数.但是,如果在此数字(值1100)之间出现另一个数字(或NA),然后重新出现(值1100),我想将其分开.

I want to divide every same value with the counts that it appears. But, if between this number (value 1100) , another number (or NA) appears, and then re-appears (value 1100) , I want to count it separatable.

# just replace values [0,1] with NA
df$myval[df$myval >= 0 & df$myval <= 1] <- NA

df <- df %>%
    group_by(myval) %>%
    mutate(counts = sum(myval == myval)) %>%
    mutate(result = (myval  / counts))

现在的结果是:

 mydate     myval counts result
   <date>     <dbl>  <int>  <dbl>
 1 2019-05-11    NA     NA    NA 
 2 2019-05-11    NA     NA    NA 
 3 2019-05-11  1500      4   375 
 4 2019-05-11  1500      4   375 
 5 2019-05-12  1500      4   375 
 6 2019-05-12  1500      4   375 
 7 2019-05-12    NA     NA    NA 
 8 2019-05-12    NA     NA    NA 
 9 2019-05-13    NA     NA    NA 
10 2019-05-13    NA     NA    NA 
11 2019-05-13  1100      6   183.
12 2019-05-13  1100      6   183.
13 2019-05-14  1100      6   183.
14 2019-05-14    NA     NA    NA 
15 2019-05-14   200      2   100 
16 2019-05-14   200      2   100 
17 2019-05-15  1100      6   183.
18 2019-05-15  1100      6   183.
19 2019-05-15  1100      6   183.
20 2019-05-15    NA     NA    NA 

但是正如您看到的,出现两次的1100值,它算了6次.我想先数3次,然后再数3次.

but as you cane see for the value 1100 that appears twice, it count it 6 times. I want to count it 3 times and then again 3 times.

例如,值1500出现4次,所以我除以1500/4.1100应该除以3,然后再除以3.

So, for example value 1500 appears 4 times, so I divide 1500/4. 1100 should be divided by 3 and then again by 3.

推荐答案

您可以使用运行长度编码"(基本上是一个累积的总和,在看到另一个值时会重新开始)来做到这一点.

You can do that using Run Length Encoding (which is basically a cumulative sum that restarts when it sees another value).

rle(df$myval) %$%
  tibble(rle = lengths,
         myval = values,
         avg = values / rle)
# A tibble: 10 x 3
#     rle myval   avg
#    <int> <dbl> <dbl>
# 1     1     0    0 
# 2     1    NA   NA 
# 3     4  1500  375 
# 4     1    NA   NA 
# 5     3     0    0 
# 6     3  1100  367.
# 7     1     0    0 
# 8     2   200  100 
# 9     3  1100  367.
# 10     1     0    0 

这篇关于计算不同时间轴中的发生次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆