分割多个日期和时间变量& R中的平均计算时间 [英] Splitting multiple date and time variables & computing time average in R

查看:94
本文介绍了分割多个日期和时间变量& R中的平均计算时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据集,其中,我拥有他们所居住的人员的ID,地区和街道,以及他们向服务器上传数据的最后日期/时间。变量 last_down_包含一个人上载数据的最后日期/时间,并以这样的方式命名:他们显示我在同一日期下载数据的日期。例如, last_upload_2020-06-12表示我是在6月12日从服务器下载数据的。

I have the following dataset wherein, I have the person's ID, district and sub-district they live in along with the last date/time on which they uploaded data to the server. The variables "last_down_" contain the last date/time on which a person the uploaded data and are named in such a way that they show the date on which I had downloaded the data on the same. For example, "last_upload_2020-06-12" would mean I downloaded the data from the server on 12th June.

对于以下数据集,我想将日期和时间溢出到每个变量中(一次全部)所创建的新的分离变量以名称 last_date_(我的下载日期)命名的方式; & " last_time_(我的下载日期)"

For the below dataset, I would like to spilt the date and time in each of the variables (all at once) in a way that the new separated variables which are created go by the name "last_date_(my download date)" & "last_time_(my download date)"

 district block id  last_upload_2020-06-12 last_upload_2020-06-13 last_upload_2020-06-14 last_upload_2020-06-15
    A   X   11  2020-02-06 11:53:19.0   2020-02-06 11:53:19.0   2020-02-06 11:53:19.0   2020-02-06 11:53:19.0
    A   X   12  2020-06-11 12:40:26.0   2020-06-11 12:40:26.0   2020-06-14 11:40:26.0   2020-06-15 18:50:26.0
    A   X                                                       2020-06-14 11:08:12.0   2020-06-14 11:08:12.0
    A   X   14  2020-06-12 11:31:07.0   2020-06-13 11:31:07.0   2020-06-14 17:37:07.0   2020-06-14 17:37:07.0
    A   Y   15  2020-06-10 12:45:48.0   2020-06-10 12:45:48.0   2020-06-10 12:45:48.0   2020-06-10 12:45:48.0
    A   Y   16  2020-04-04 02:26:57.0   2020-04-04 02:26:57.0   2020-04-04 02:26:57.0   2020-04-04 02:26:57.0
    A   Y   17  2020-03-31 08:10:03.0   2020-03-31 08:10:03.0   2020-03-31 08:10:03.0   2020-03-31 08:10:03.0
    A   Y   18  2020-05-30 12:08:15.0   2020-05-30 12:08:15.0   2020-05-30 12:08:15.0   2020-05-30 12:08:15.0
    A   Z   19  2020-04-09 15:21:52.0   2020-04-09 15:21:52.0   2020-04-09 15:21:52.0   2020-04-09 15:21:52.0
    A   Z   20  2020-05-30 17:42:33.0   2020-05-30 17:42:33.0   2020-05-30 17:42:33.0   2020-05-30 17:42:33.0
    A   Z   21  2020-04-12 14:23:29.0   2020-04-12 14:23:29.0   2020-04-12 14:23:29.0   2020-04-12 14:23:29.0
    A   Z   22  2020-05-13 23:18:19.0   2020-05-13 23:18:19.0   2020-05-13 23:18:19.0   2020-05-13 23:18:19.0
    A   X   23  2020-04-30 09:53:31.0   2020-04-30 09:53:31.0   2020-04-30 09:53:31.0   2020-04-30 09:53:31.0
    A   X   24  2020-06-10 10:28:59.0   2020-06-10 10:28:59.0   2020-06-10 10:28:59.0   2020-06-15 11:31:33.0
    A   Y   25              
    A   Y   26  2020-05-30 12:14:09.0   2020-05-30 12:14:09.0   2020-05-30 12:14:09.0   2020-05-30 12:14:09.0
    B   E   31              
    B   C   32  2020-06-12 16:43:23.0   2020-06-12 16:43:23.0   2020-06-12 16:43:23.0   2020-06-12 16:43:23.0
    B   C   33  2019-10-24 22:30:35.0   2019-10-24 22:30:35.0   2019-10-24 22:30:35.0   2019-10-24 22:30:35.0
    B   C   34  2020-06-09 15:38:18.0   2020-06-09 15:38:18.0   2020-06-09 15:38:18.0   2020-06-15 14:35:41.0
    B   C   35  2020-06-11 14:39:51.0   2020-06-11 14:39:51.0   2020-06-11 14:39:51.0   2020-06-11 14:39:51.0
    B   D   36  2020-06-12 11:53:15.0   2020-06-12 11:53:15.0   2020-06-12 11:53:15.0   2020-06-15 13:02:39.0
    B   D   37  2020-04-21 15:43:43.0   2020-04-21 15:43:43.0   2020-04-21 15:43:43.0   2020-04-21 15:43:43.0
    B   D   38  2020-05-13 04:07:17.0   2020-05-13 04:07:17.0   2020-05-13 04:07:17.0   2020-05-13 04:07:17.0
    B   E   39  2020-04-30 13:51:20.0   2020-04-30 13:51:20.0   2020-04-30 13:51:20.0   2020-04-30 13:51:20.0
    B   E   40  2020-05-12 16:51:01.0   2020-05-12 16:51:01.0   2020-05-12 16:51:01.0   2020-05-12 16:51:01.0
    B   E   41  2020-04-16 12:14:24.0   2020-04-16 12:14:24.0   2020-04-16 12:14:24.0   2020-04-16 12:14:24.0
    B   C   42  2018-06-07 15:12:18.0   2018-06-07 15:12:18.0   2018-06-07 15:12:18.0   2018-06-07 15:12:18.0
    B   D   43  2019-09-28 10:08:51.0   2019-09-28 10:08:51.0   2019-09-28 10:08:51.0   2019-09-28 10:08:51.0

NB:我的日期/时间变量是数字。

N.B: my date/time variables are numeric.

一旦我获得了数据,我也想做以下:

Once I get the data in shape, I would also like to do the following:


  1. 获取 last_upload_2020-06-12下所有观测值的年和月。

  1. Get the year and month of all observations under "last_upload_2020-06-12" in a separate column.

类似地,对于我的数据集中的最后一个日期 last_upload_2020-06-15。我可以自动选择R的最后日期,例如Sys.Date()-1吗?我将始终获得比当前日期少一个日期的数据。

Similarly, for the last date in my dataset that is "last_upload_2020-06-15". Can I automate R picking the last date something like Sys.Date()-1? I will always have the data for one date less than current.

计算每个ID的平均上传时间,即通常在几点钟左右一个人将数据上传到服务器?平均值应该基于唯一的时间值。

Calculate the average upload time per ID, i.e., generally around what time does a person upload data to the server? Average should be based on unique time values.

如果有人可以解决这个问题,将非常有帮助!

Would be extremely helpful if someone could help solve this!

谢谢,
Rachita

Thanks, Rachita

推荐答案

Df看起来太复杂了,我以为可能更好地复制它。
然后,我使用一个函数获取所需的每一列,并根据需要将其分为 last_date last_time 。在函数内部,临时DF是 cbind 到在循环外部构建的DF。该DF由循环中未处理的列组成。
此循环的结果是所需的DF。 [名字有点长]

The Df looked so complicated that I thought it might be better to replicate it. I then used a function to take every column you wanted and separate it into the last_date and last_time as wanted. Inside the function the temporary DF is cbind to a DF built outside of the loop. This DF consisted out of the columns which are not treated in the loop. The result of this loop is the DF as wanted. [colnames got a little long]

第二项任务的关键是将 last_time 转移到几个小时,然后进行分组和汇总。

The key for the second task was to transfer to last_time to hours, then grouping und summarizing.

我希望这就是您想要的。

I hope this is what you wanted.

我认为以此为基础,您可以应对第二名。

I think with this as a basis you can deal with no2.

有些警告与NA有关。

下面的reprex中有更多说明。

More explanation in the reprex below.

library(tidyverse)


df <- read.table(text = '
district block id  last_upload_2020_06_12 last_upload_2020_06_13 last_upload_2020_06_14 last_upload_2020_06_15
"A"   "X"   11  "2020-02-06 11:53:19.0"   "2020-02-06 11:53:19.0"   "2020-02-06 11:53:19.0"   "2020-02-06 11:53:19.0"
"A"   "X"   12  "2020-06-11 12:40:26.0"   "2020-06-11 12:40:26.0"   "2020-06-14 11:40:26.0"   "2020-06-15 18:50:26.0"
"A"   "X"   NA  "NA"                      "NA"                      "2020-06-14 11:0812.0"    "2020-06-14 11:0812.0"
"A"   "X"   14  "2020-06-12 11:31:07.0"   "2020-06-13 11:31:07.0"   "2020-06-14 17:37:07.0"   "2020-06-14 17:37:07.0"
"A"   "Y"   15  "2020-06-10 12:45:48.0"   "2020-06-10 12:45:48.0"   "2020-06-10 12:45:48.0"   "2020-06-10 12:45:48.0"
"A"   "Y"   16  "2020-04-04 02:26:57.0"   "2020-04-04 02:26:57.0"   "2020-04-04 02:26:57.0"   "2020-04-04 02:26:57.0"
"A"   "Y"   17  "2020-03-31 08:10:03.0"   "2020-03-31 08:10:03.0"   "2020-03-31 08:10:03.0"   "2020-03-31 08:10:03.0"
"A"   "Y"   18  "2020-05-30 12:08:15.0"   "2020-05-30 12:08:15.0"   "2020-05-30 12:08:15.0"   "2020-05-30 12:08:15.0"
"A"   "Z"   19  "2020-04-09 15:21:52.0"   "2020-04-09 15:21:52.0"   "2020-04-09 15:21:52.0"   "2020-04-09 15:21:52.0"
"A"   "Z"   20  "2020-05-30 17:42:33.0"   "2020-05-30 17:42:33.0"   "2020-05-30 17:42:33.0"   "2020-05-30 17:42:33.0"
"A"   "Z"   21  "2020-04-12 14:23:29.0"   "2020-04-12 14:23:29.0"   "2020-04-12 14:23:29.0"   "2020-04-12 14:23:29.0"
"A"   "Z"   22  "2020-05-13 23:18:19.0"   "2020-05-13 23:18:19.0"   "2020-05-13 23:18:19.0"   "2020-05-13 23:18:19.0"
"A"   "X"   23  "2020-04-30 09:53:31.0"   "2020-04-30 09:53:31.0"   "2020-04-30 09:53:31.0"   "2020-04-30 09:53:31.0"
"A"   "X"   24  "2020-06-10 10:28:59.0"   "2020-06-10 10:28:59.0"   "2020-06-10 10:28:59.0"   "2020-06-15 11:31:33.0"
"A"   "Y"   25  " "                        ""                     ""                         ""
"A"   "Y"   26  "2020-05-3012:14:09.0"   "2020-05-30 12:14:09.0"   "2020-05-30 12:14:09.0"   "2020-05-30 12:14:09.0"
"B"   "E"   31  ""            ""                      ""                        ""          ""
"B"   "C"   32  "2020-06-1 16:43:23.0"   "2020-06-12 16:43:23.0"   "2020-06-12 16:43:23.0"   "2020-06-12 16:43:23.0"
"B"   "C"   33  "2019-10-24 22:30:35.0"   "2019-10-24 22:30:35.0"   "2019-10-24 22:30:35.0"   "2019-10-24 22:30:35.0"
"B"   "C"   34  "2020-06-09 15:38:18.0"   "2020-06-09 15:38:18.0"   "2020-06-09 15:38:18.0"   "2020-06-15 14:35:41.0"
"B"   "C"   35  "2020-06-11 14:39:51.0"   "2020-06-11 14:39:51.0"   "2020-06-11 14:39:51.0"   "2020-06-11 14:39:51.0"
"B"   "D"   36  "2020-06-12 11:53:15.0"   "2020-06-12 11:53:15.0"   "2020-06-12 11:53:15.0"   "2020-06-15 13:02:39.0"
"B"   "D"   37  "2020-04-21 15:43:43.0"   "2020-04-21 15:43:43.0"   "2020-04-21 15:43:43.0"   "2020-04-21 15:43:43.0"
"B"   "D"   38  "2020-05-13 04:07:17.0"   "2020-05-13 04:07:17.0"   "2020-05-13 04:07:17.0"   "2020-05-13 04:07:17.0"
"B"   "E"   39  "2020-04-30 13:51:20.0"   "2020-04-30 13:51:20.0"   "2020-04-30 13:51:20.0"   "2020-04-30 13:51:20.0"
"B"   "E"   40  "2020-05-12 16:51:01.0"   "2020-05-12 16:51:01.0"   "2020-05-12 16:51:01.0"   "2020-05-12 16:51:01.0"
"B"   "E"   41  "2020-04-16 12:14:24.0"   "2020-04-16 12:14:24.0"   "2020-04-16 12:14:24.0"   "2020-04-16 12:14:24.0"
"B"   "C"   42  "2018-06-07 15:12:18.0"   "2018-06-07 15:12:18.0"   "2018-06-07 15:12:18.0"   "2018-06-07 15:12:18.0"
"B"   "D"   43  "2019-09-28 10:08:51.0"   "2019-09-28 10:08:51.0"   "2019-09-28 10:08:51.0"   "2019-09-28 10:08:51.0"
           ', header =T)


# TASK: create for each column which contains 'last_upload' new columns 
# with date and time

# get the colnames of the cols to be split or separated
ccl <- colnames(df %>% select(last_upload_2020_06_12:last_upload_2020_06_15))

# create new DF with first 3 columns, to which other columns are bound in
# the following function
dff <- df %>% select(district:id) 

# function to separate each col in ccl to _date and _time
for (cl in ccl) {
  tmp <- separate(df,
    col = cl, sep = " ",
    into = c(paste0(cl, "_date"), paste0(cl, "_time"))
  ) %>%
    select(contains("_date") | contains("_time"))
  dff <- cbind(dff, tmp)
}


dff %>% head()
#>   district block id last_upload_2020_06_12_date last_upload_2020_06_12_time
#> 1        A     X 11                  2020-02-06                  11:53:19.0
#> 2        A     X 12                  2020-06-11                  12:40:26.0
#> 3        A     X NA                        <NA>                        <NA>
#> 4        A     X 14                  2020-06-12                  11:31:07.0
#> 5        A     Y 15                  2020-06-10                  12:45:48.0
#> 6        A     Y 16                  2020-04-04                  02:26:57.0
#>   last_upload_2020_06_13_date last_upload_2020_06_13_time
#> 1                  2020-02-06                  11:53:19.0
#> 2                  2020-06-11                  12:40:26.0
#> 3                        <NA>                        <NA>
#> 4                  2020-06-13                  11:31:07.0
#> 5                  2020-06-10                  12:45:48.0
#> 6                  2020-04-04                  02:26:57.0
#>   last_upload_2020_06_14_date last_upload_2020_06_14_time
#> 1                  2020-02-06                  11:53:19.0
#> 2                  2020-06-14                  11:40:26.0
#> 3                  2020-06-14                   11:0812.0
#> 4                  2020-06-14                  17:37:07.0
#> 5                  2020-06-10                  12:45:48.0
#> 6                  2020-04-04                  02:26:57.0
#>   last_upload_2020_06_15_date last_upload_2020_06_15_time
#> 1                  2020-02-06                  11:53:19.0
#> 2                  2020-06-15                  18:50:26.0
#> 3                  2020-06-14                   11:0812.0
#> 4                  2020-06-14                  17:37:07.0
#> 5                  2020-06-10                  12:45:48.0
#> 6                  2020-04-04                  02:26:57.0

# TASK: Calculate the average time of a day each id does a download
# new DF from original brought into long format
# split the date/time into last_date and last_time
ddf <- df %>%
  pivot_longer(cols = last_upload_2020_06_12:last_upload_2020_06_15) %>%
  separate(col = value, sep = ' ', into = c('last_date', 'last_time')) %>%
  mutate(last_date = lubridate::ymd(last_date), last_time= lubridate::hms(last_time))


# calculating the mean hour of the day at which each id does a 
# download, by calculating last_time to hours (of the day) and
# after grouping build mean hour
ddf %>% 
  mutate(hours = as.numeric(lubridate::hms(last_time), unit = 'hour')) %>%
  group_by(id) %>% summarise(meanHourOfTheDay = mean(hours, na.rm = T))

#> # A tibble: 29 x 2
#>       id meanHourOfTheDay
#>    <int>            <dbl>
#>  1    11            11.9 
#>  2    12            14.0 
#>  3    14            14.6 
#>  4    15            12.8 
#>  5    16             2.45
#>  6    17             8.17
#>  7    18            12.1 
#>  8    19            15.4 
#>  9    20            17.7 
#> 10    21            14.4 
#> # … with 19 more rows

这篇关于分割多个日期和时间变量&amp; R中的平均计算时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆