根据条件汇总列 [英] Summing columns based on criteria

查看:37
本文介绍了根据条件汇总列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由三列组成的数据框:x、ID 和 date_time.x"列是变量 x 的记录,ID 表示正在记录的内容,而 date_time 表示何时.请参阅下面的一段数据框.

I have a dataframe consisting of three columns: x, ID and date_time. The "x" column is a recording of a variable x, ID indicate what is being recorded, while date_time indicates when. See a piece of the dataframe below.

从这个数据框,我想计算一个有七列的新数据框:测量",ID";和日期"、x_4_10_day"、Day_total"、x_4_10_night"、Night_total".

From this dataframe I would like to calculate a new dataframe that has seven columns: "Measurement", "ID" and "Date", "x_4_10_day", "Day_total", "x_4_10_night", "Night_total".

  1. 测量".此列应说明给定 ID 的数字度量.测量从 23:00:00 开始,然后一直持续到第二天的 22:59:59.然而,测量在随机时间开始,因此第一次测量的持续时间不是 24 小时.最后一次测量也不是 24 小时.
  2. 身份证".指明给定测量的 ID
  3. 日期".此列应以以下格式显示给定测量中最后一次记录的日期:yyyy.mm.dd.
  4. x_4_10_day":测量分为一天 (7:00:00-22:59:59) 和一夜 (23:00:00-6:59:59).此列应指示在给定测量中每天的总时间(以分钟为单位)x 介于 4-10(均包括在内)之间.x 在 4-10 之间的录音可视为 x 在 4-10 之间持续 5 分钟,因为每次录音之间有 5 分钟.
  5. Day_total":此列应指示一天中测量的总时间(以分钟为单位)x.x 中存在应减去的缺失值.x 的缺失值留空.对于每次丢失的测量,应从总时间中减去 5 分钟.此外,有些测量开始时间晚于 7:00.
  6. x_4_10_night":此列应表明在给定的测量中,每晚 x 在 4-10(均包括在内)之间的总时间(以分钟为单位).
  7. Night_total":此列应指明一晚测量的总时间(以分钟为单位)x.x 中存在应减去的缺失值.x 的缺失值留空.对于每次缺失的测量,应从总时间中减去 5 分钟.

每个唯一的测量值都应该有一行.到目前为止,我有一个返回列的代码:Measurement"、ID"和ID".和日期"正确:

There should be a row for every unique measurement. So far I have a code that returns the columns: "Measurement", "ID" and "Date" correctly:

df1$mydate = as.Date(df1$date_time, format = "%Y.%m.%d %H:%M:%S")
df1$tm <- as.numeric(df1$date_time)
df1$dts <- 86400*as.numeric(df1$mydate)
df2 <- df1 %>% 
group_by(ID,mydate) %>% 
transform(date = case_when(((dts-3600)<tm & tm<(dts+82800)) ~paste0(mydate), ((dts+82800)<=tm) ~paste0(mydate+1) )) %>% 
select(ID,date) %>%   
unique() %>% 
group_by(ID) %>% 
mutate(measurement = row_number())

但是我不知道如何做最后的.

however I don’t know how to do the last ones.

这是预期的输出:

dummy_output <- read.table(header=TRUE, text ="
                     ID Date        Measurement x_4_10_day Day_total x_4_10_night Night_total
                     12 2020.03.02  1           30         40        0            0
                     12 2020.03.03  2           0          0         45           75
                     13 2020.05.09  1           90         90        0            0
") 

非常感谢任何建议,谢谢!

Any suggestions are much appreciated, thanks!

这是数据:

structure(list(date_time = c("2020.03.02 22:00:17", "2020.03.02 22:05:17", 
"2020.03.02 22:10:17", "2020.03.02 22:35:17", "2020.03.02 22:40:17", 
"2020.03.02 22:45:17", "2020.03.02 22:50:17", "2020.03.02 22:55:17", 
"2020.03.02 23:00:17", "2020.03.02 23:05:17", "2020.03.02 23:10:17", 
"2020.03.02 23:15:17", "2020.03.02 23:20:17", "2020.03.02 23:25:17", 
"2020.03.02 23:30:17", "2020.03.02 23:35:17", "2020.03.02 23:40:17", 
"2020.03.02 23:45:17", "2020.03.02 23:50:17", "2020.03.02 23:55:17", 
"2020.03.03 00:00:17", "2020.03.03 00:55:17", "2020.03.03 01:00:17", 
"2020.03.03 01:05:17", "2020.03.03 01:10:17", "2020.03.03 01:15:17", 
"2020.03.03 01:20:17", "2020.03.03 01:25:17", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32"), id = c(12L, 12L, 
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 
13L, 13L, 13L, 13L, 13L), x = c("7.55", "4.55", "4.55", "12", 
"12", "10", "10", "4.3", "", "", "4.3", "4.3", "4.3", "", "4.3", 
"12", "12", "12", "2", "12", "12", "", "8", "3", "3", "2", "2", 
"", "12", "10", "10", "4.3", "4.3", "4.3", "4.3", "4.3", "4.3", 
"4.3", "4.3", "12", "12", "12", "12", "12", "12", "12")), row.names = c(NA, 
46L), class = "data.frame")

推荐答案

我已将 id=14 添加到您的数据框中,其中仅包含夜间值.也许这就是您正在寻找的.请注意,您的预期值不完全符合您的要求.

I have added id=14 with only night values to your dataframe. Perhaps this is what you are looking for. Please note that your expected values do not comply with your requirements fully.

df11 <- structure(list(date_time = c("2020.03.02 22:00:17", "2020.03.02 22:05:17", 
                             "2020.03.02 22:10:17", "2020.03.02 22:35:17", "2020.03.02 22:40:17", 
                             "2020.03.02 22:45:17", "2020.03.02 22:50:17", "2020.03.02 22:55:17", 
                             "2020.03.02 23:00:17", "2020.03.02 23:05:17", "2020.03.02 23:10:17", 
                             "2020.03.02 23:15:17", "2020.03.02 23:20:17", "2020.03.02 23:25:17", 
                             "2020.03.02 23:30:17", "2020.03.02 23:35:17", "2020.03.02 23:40:17", 
                             "2020.03.02 23:45:17", "2020.03.02 23:50:17", "2020.03.02 23:55:17", 
                             "2020.03.03 00:00:17", "2020.03.03 00:55:17", "2020.03.03 01:00:17", 
                             "2020.03.03 01:05:17", "2020.03.03 01:10:17", "2020.03.03 01:15:17", 
                             "2020.03.03 01:20:17", "2020.03.03 01:25:17", "2020.05.09 08:39:32", 
                             "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                             "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                             "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                             "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                             "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                             "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                             "2020.03.02 23:45:17", "2020.03.02 23:50:17", "2020.03.02 23:55:17", 
                             "2020.03.03 00:00:17", "2020.03.03 00:55:17", "2020.03.03 01:00:17" 
                             ), 
                      x = c("7.55", "4.55", "4.55", "12", 
                            "12", "10", "10", "4.3", "", "", "4.3", "4.3", "4.3", "", "4.3", 
                            "12", "12", "12", "2", "12", "12", "", "8", "3", "3", "2", "2", 
                            "", "12", "10", "10", "4.3", "4.3", "4.3", "4.3", "4.3", "4.3", 
                            "4.3", "4.3", "12", "12", "12", "12", "12", "12", "12",
                            "12", "10", "10", "4.3", "4.3", "4.3"),
               id = c(12L, 12L, 
                      12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
                      12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
                      13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 
                      13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L)), 
               row.names = c(NA, 52L), class = "data.frame")

df11$xn <- as.numeric(df11$x)
df1 <- df11 %>% transform(xmin = ifelse((xn<4 | xn>10 | is.na(xn)),0,5 ),
                          xmint = ifelse(is.na(xn),-5,5 ))
df1$dateTime = as_datetime(df1$date_time, format = "%Y.%m.%d %H:%M:%S")
df1$mydate = as.Date(df1$date_time, format = "%Y.%m.%d %H:%M:%S")

df1$tm <- as.numeric(df1$dateTime)
df1$dts <- 86400*as.numeric(df1$mydate)

df2 <- df1 %>% group_by(id,mydate) %>% 
         transform(date = case_when(((dts-3600)<tm & tm<(dts+82800) )~paste0(mydate),((dts+82800)<=tm)~paste0(mydate+1) )) %>%
         transform(dayrnight = ifelse((tm>=(dts+25200) & tm<(dts+82800) ),'day','night' ) ) %>% 
         group_by(id,date,dayrnight) %>% 
         dplyr::summarise(x_4_10 = sum(xmin), total = sum(xmint)) %>% 
         pivot_wider(id_cols = c(id,date), names_from = dayrnight, values_from = c("x_4_10", "total")) %>% 
         mutate_if(is.numeric , replace_na, replace = 0) %>% 
         group_by(id) %>% mutate(measurement = row_number()) %>% 
         select(id,date,measurement,x_4_10_day,total_day,x_4_10_night,total_night)

> df2
# A tibble: 4 x 7
# Groups:   id [3]
     id date       measurement x_4_10_day total_day x_4_10_night total_night
  <int> <chr>            <int>      <dbl>     <dbl>        <dbl>       <dbl>
1    12 2020-03-02           1         30        40            0           0
2    12 2020-03-03           2          0         0           25          50
3    13 2020-05-09           1         50        90            0           0
4    14 2020-03-03           1          0         0           25          30

这篇关于根据条件汇总列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆