R:分离混合数据列,日期多次 [英] R: Separating out a mixed data column, date above multiple times

查看:159
本文介绍了R:分离混合数据列,日期多次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个情况,我有一个数据框架,其中一个向量的日期高于一系列时间,我想转换成某种POSIX日期时间字段。



例如:


7/16/2014,5:06:59 PM 11:51:26 AM,7/13/2014,3:53:16 PM,
3:24:19 PM,11:47:49 AM,7 / 12/2014,11:57:41 AM,7/11/2014,
10:01:48 AM,7/10/2014,4:54:08 PM,2:23:04 PM,11:34:09 AM


从概念上讲,是使用正则表达式将此混合矢量复制到DATEONLY矢量和TIMEONLY矢量,因此它们保持相同的位置,然后使用类似于tidyr的填充函数填充DATEONLY矢量中的空白点,然后重新组合DATEONLY AND TIMEONLY列...但我有点困惑,从哪里开始。



我想把它当作


7/16/2014 5:06:59 PM,7/16/201 4 11:51:26 AM,7/13/2014 3:53:16
PM等...



解决方案

我不认为这是一个简洁的方式来实现你的任务。但是,以下作品。我不能想出分裂矢量(即x)的好主意。所以我决定使用数据框架。首先,我创建了一个组变量。为了做到这一点,正如您在问题中提到的,我搜索了日期(月/日/年)的指数。使用索引和 na.locf(),我填写组列。然后,我按组分割数据,并使用 stri_join()处理粘贴日期和时间。最后,我取消列表。如果你想要日期对象,你需要这样做。

 库(zoo)
库(magrittr)
库(stringi)

x< - c(7/16/2014,5:06:59 PM,11:51:26 AM,
7/13/2014,3:53:16 PM,3:24:19 PM,11:47:49 AM,
7/12/2014,11: 57:41 AM,7/11/2014,10:01:48 AM,
7/10/2014,4:54:08 PM,2:23:04 PM,11:34:09 AM)

#创建数据框
mydf< - data.frame(date = x,group = NA)

#获取日期(月/日/年)的索引
ind< - grep(pattern =\\d + / \\d + / \\d +,x = mydf $日期)

#将组号添加到mydf $ group的ind位置,
#填充NA与组号

mydf $ group [ind]< ; - 1:length(ind)
mydf $ group< - na.locf(mydf $ group)

#按组分割数据框并创建日期(字符)
split(mydf,mydf $ group)%>%
lapply(function(x){
stri_join(x $ date [1],x $日期[2:length(x $ date)],sep =)})%>%
unlist


11 12 21 22
7 / 16/2014 5:06:59 PM7/16/2014 11:51:26 AM7/13/2014 3:53:16 PM7/13/2014 3:24:19 PM
23 3 4 51
7/13/2014 11:47:49 AM7/12/2014 11:57:41 AM7/11/2014 10:01:48 AM 7/10/2014 4:54:08 PM
52 53
7/10/2014 2:23:04 PM7/10/2014 11:34:09 AM


I have a situation where I have a data.frame where a vector has the date above a sequence of times, and I'd like to convert into some kind of POSIX date-time field.

For example:

"7/16/2014", "5:06:59 PM", "11:51:26 AM", "7/13/2014", "3:53:16 PM", "3:24:19 PM", "11:47:49 AM", "7/12/2014", "11:57:41 AM", "7/11/2014", "10:01:48 AM", "7/10/2014", "4:54:08 PM", "2:23:04 PM", "11:34:09 AM"

Conceptually, it seems what to do is to replicate this MIXED vector into a DATEONLY vector and a TIMEONLY vector using regular expressions, so they maintain the same position, and then use something like fill function from tidyr to fill in the blank spots in the DATEONLY vector, then recombine the DATEONLY AND TIMEONLY columns... but I'm a bit stumped as to where to start.

I'd like to have it present as

"7/16/2014 5:06:59 PM", "7/16/2014 11:51:26 AM", "7/13/2014 3:53:16 PM" etc...

解决方案

I do not think this is a concise way to achieve your task. But, the following works. I could not come up with a good idea of splitting the vector (i.e., x). So I decided to work with a data frame. First, I created a group variable. In order to do that, as you mentioned in your question, I searched indices of date (month/day/year). Using the indices and na.locf(), I fill in the group column. Then, I split the data by group and handled pasting date and time with stri_join(). Finally, I unlist the list. If you want date objects, you need to work on that.

library(zoo)
library(magrittr)
library(stringi)

x <- c("7/16/2014", "5:06:59 PM", "11:51:26 AM",
       "7/13/2014", "3:53:16 PM", "3:24:19 PM", "11:47:49 AM",
       "7/12/2014", "11:57:41 AM", "7/11/2014", "10:01:48 AM",
       "7/10/2014", "4:54:08 PM", "2:23:04 PM", "11:34:09 AM")

# Create a data frame
mydf <- data.frame(date = x, group = NA)

# Get indices for date (month/day/year)
ind <- grep(pattern = "\\d+/\\d+/\\d+", x = mydf$date)

# Add group number to the ind positions of mydf$group and
# fill NA with the group numbers

mydf$group[ind] <- 1:length(ind)
mydf$group <- na.locf(mydf$group)

# Split the data frame by group and create dates (in character)
split(mydf, mydf$group) %>%
lapply(function(x){
          stri_join(x$date[1], x$date[2:length(x$date)], sep = " ")}) %>%
unlist


                     11                      12                      21                      22 
"7/16/2014 5:06:59 PM" "7/16/2014 11:51:26 AM"  "7/13/2014 3:53:16 PM"  "7/13/2014 3:24:19 PM" 
                     23                       3                       4                      51 
"7/13/2014 11:47:49 AM" "7/12/2014 11:57:41 AM" "7/11/2014 10:01:48 AM" "7/10/2014 4:54:08 PM" 
                     52                      53 
"7/10/2014 2:23:04 PM" "7/10/2014 11:34:09 AM" 

这篇关于R:分离混合数据列,日期多次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆