如何根据R中的条件获取日期的频率计数? [英] How to get frequency count of date based on condition in R?

查看:526
本文介绍了如何根据R中的条件获取日期的频率计数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是我的场景.

场景

我有两个数据框.第一个数据框包含有关系统使用情况的数据,另一个数据框包含有关系统位置的数据.我想根据系统的使用日期以及仪器所处的位置来跟踪仪器的使用情况.为此,我正在使用dplyr库在数据帧上执行外部联接.接下来,我想基于日期获取系统的频率计数.为此,我在系统和位置上使用groupby.如果未使用该系统,则该系统的频率计数应为0.但是,当我查看位于位置3的系统6时,由于该仪器未使用(没有日期,假设未使用),该系统的频率计数应为0,因为日期"或用户"列不包含任何数据.但是,下面的代码返回的频率计数为1.我不确定这可能是错误的.下面是当前和预期的输出.

I have two dataframe. 1st dataframe contains data about system usage and another dataframe contains data about System location. I would like to track instrument usage based on date the system was used and also the location where the instrument is located. For this I am performing outer join on dataframes using dplyr library. Next, I would like to get frequency count of the systems based on date. For this I am using groupby on System and Locations. If the system is not in use the frequency count for that system should be 0.However, when I look at System 6, which is at location 3. Since, the instrument is not in use(No Date~assume not in use), the frequency count for that system should be 0, because Date or User column does not contain any data. However, below code is returning frequency count of 1. I am not sure, what could be wrong.Below is current and expected output.

提供解释并提供代码.

数据框1:

df <- data.frame("Users" =c('A',"B","A",'C','B'), "Date" = c('17-03-2019','15-03-2019','11-03-2019','20-04-2019',"21-04-2019"), "Systems" = c("Sys1", "Sys1","Sys2","Sys3","Sys4"), stringsAsFactors = FALSE)
df
  Users       Date Systems
1     A 17-03-2019    Sys1
2     B 15-03-2019    Sys1
3     A 11-03-2019    Sys2
4     C 20-04-2019    Sys3
5     B 21-04-2019    Sys4

数据框2

loc_df<-data.frame("Locations" =c('loc1','loc1','loc2','loc2','loc3'),"Systems" = c("Sys1","Sys2","Sys3","Sys4","Sys6"), stringsAsFactors = FALSE)
loc_df

  Locations Systems
1      loc1    Sys1
2      loc1    Sys2
3      loc2    Sys3
4      loc2    Sys4
5      loc3    Sys6

频率计数代码

#Merging df
merge_df<-join(df, loc_df,type = "full")
#Replcaing NA's with 0
merge_df[is.na(merge_df)] <- 0
merge_df

#Code for frequency count
merge_df %>%
  group_by(Systems,Locations)%>%
  summarise(frequency = n())

当前输出:

  Systems Locations frequency
  <chr>   <chr>         <int>
1 Sys1    loc1              2
2 Sys2    loc1              1
3 Sys3    loc2              1
4 Sys4    loc2              1
5 Sys6    loc3              1

预期产量

 Systems Locations frequency
  <chr>   <chr>         <int>
1 Sys1    loc1              2
2 Sys2    loc1              1
3 Sys3    loc2              1
4 Sys4    loc2              1
5 Sys6    loc3              0

推荐答案

由于NA已更改为0(merge_df[is.na(merge_df)] <- 0),因此我们可以进行逻辑评估并获得sum而不是,它将返回行数,并且该行已经存在

As the NAs are already changed to 0 (merge_df[is.na(merge_df)] <- 0), we can do a logical evaluation and get the sum instead of n(), which will return the number of rows and here the row is already present

library(dplyr)
merge_df %>% 
   group_by(Systems, Locations) %>%
   summarise(frequeency = sum(Date != 0))
# A tibble: 5 x 3
# Groups:   Systems [5]
#  Systems Locations frequeency
#  <chr>   <chr>          <int>
#1 Sys1    loc1               2
#2 Sys2    loc1               1
#3 Sys3    loc2               1
#4 Sys4    loc2               1
#5 Sys6    loc3               0

除了将其更改为0之外,还可以使用sum(!is.na(Date))完成此操作,因为NA比0更合适

Instead of changing it to 0, it could also be done with sum(!is.na(Date)) as NA is more appropriate than 0

这篇关于如何根据R中的条件获取日期的频率计数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆