R:分离混合数据列,日期多次 [英] R: Separating out a mixed data column, date above multiple times
问题描述
例如:
7/16/2014,5:06:59 PM 11:51:26 AM,7/13/2014,3:53:16 PM,
3:24:19 PM,11:47:49 AM,7 / 12/2014,11:57:41 AM,7/11/2014,
10:01:48 AM,7/10/2014,4:54:08 PM,2:23:04 PM,11:34:09 AM
从概念上讲,是使用正则表达式将此混合矢量复制到DATEONLY矢量和TIMEONLY矢量,因此它们保持相同的位置,然后使用类似于tidyr的填充函数填充DATEONLY矢量中的空白点,然后重新组合DATEONLY AND TIMEONLY列...但我有点困惑,从哪里开始。
我想把它当作
7/16/2014 5:06:59 PM,7/16/201 4 11:51:26 AM,7/13/2014 3:53:16
PM等...
我不认为这是一个简洁的方式来实现你的任务。但是,以下作品。我不能想出分裂矢量(即x)的好主意。所以我决定使用数据框架。首先,我创建了一个组变量。为了做到这一点,正如您在问题中提到的,我搜索了日期(月/日/年)的指数。使用索引和 na.locf()
,我填写组列。然后,我按组分割数据,并使用 stri_join()
处理粘贴日期和时间。最后,我取消列表。如果你想要日期对象,你需要这样做。
库(zoo)
库(magrittr)
库(stringi)
x< - c(7/16/2014,5:06:59 PM,11:51:26 AM,
7/13/2014,3:53:16 PM,3:24:19 PM,11:47:49 AM,
7/12/2014,11: 57:41 AM,7/11/2014,10:01:48 AM,
7/10/2014,4:54:08 PM,2:23:04 PM,11:34:09 AM)
#创建数据框
mydf< - data.frame(date = x,group = NA)
#获取日期(月/日/年)的索引
ind< - grep(pattern =\\d + / \\d + / \\d +,x = mydf $日期)
#将组号添加到mydf $ group的ind位置,
#填充NA与组号
mydf $ group [ind]< ; - 1:length(ind)
mydf $ group< - na.locf(mydf $ group)
#按组分割数据框并创建日期(字符)
split(mydf,mydf $ group)%>%
lapply(function(x){
stri_join(x $ date [1],x $日期[2:length(x $ date)],sep =)})%>%
unlist
11 12 21 22
7 / 16/2014 5:06:59 PM7/16/2014 11:51:26 AM7/13/2014 3:53:16 PM7/13/2014 3:24:19 PM
23 3 4 51
7/13/2014 11:47:49 AM7/12/2014 11:57:41 AM7/11/2014 10:01:48 AM 7/10/2014 4:54:08 PM
52 53
7/10/2014 2:23:04 PM7/10/2014 11:34:09 AM
I have a situation where I have a data.frame where a vector has the date above a sequence of times, and I'd like to convert into some kind of POSIX date-time field.
For example:
"7/16/2014", "5:06:59 PM", "11:51:26 AM", "7/13/2014", "3:53:16 PM", "3:24:19 PM", "11:47:49 AM", "7/12/2014", "11:57:41 AM", "7/11/2014", "10:01:48 AM", "7/10/2014", "4:54:08 PM", "2:23:04 PM", "11:34:09 AM"
Conceptually, it seems what to do is to replicate this MIXED vector into a DATEONLY vector and a TIMEONLY vector using regular expressions, so they maintain the same position, and then use something like fill function from tidyr to fill in the blank spots in the DATEONLY vector, then recombine the DATEONLY AND TIMEONLY columns... but I'm a bit stumped as to where to start.
I'd like to have it present as
"7/16/2014 5:06:59 PM", "7/16/2014 11:51:26 AM", "7/13/2014 3:53:16 PM" etc...
I do not think this is a concise way to achieve your task. But, the following works. I could not come up with a good idea of splitting the vector (i.e., x). So I decided to work with a data frame. First, I created a group variable. In order to do that, as you mentioned in your question, I searched indices of date (month/day/year). Using the indices and na.locf()
, I fill in the group column. Then, I split the data by group and handled pasting date and time with stri_join()
. Finally, I unlist the list. If you want date objects, you need to work on that.
library(zoo)
library(magrittr)
library(stringi)
x <- c("7/16/2014", "5:06:59 PM", "11:51:26 AM",
"7/13/2014", "3:53:16 PM", "3:24:19 PM", "11:47:49 AM",
"7/12/2014", "11:57:41 AM", "7/11/2014", "10:01:48 AM",
"7/10/2014", "4:54:08 PM", "2:23:04 PM", "11:34:09 AM")
# Create a data frame
mydf <- data.frame(date = x, group = NA)
# Get indices for date (month/day/year)
ind <- grep(pattern = "\\d+/\\d+/\\d+", x = mydf$date)
# Add group number to the ind positions of mydf$group and
# fill NA with the group numbers
mydf$group[ind] <- 1:length(ind)
mydf$group <- na.locf(mydf$group)
# Split the data frame by group and create dates (in character)
split(mydf, mydf$group) %>%
lapply(function(x){
stri_join(x$date[1], x$date[2:length(x$date)], sep = " ")}) %>%
unlist
11 12 21 22
"7/16/2014 5:06:59 PM" "7/16/2014 11:51:26 AM" "7/13/2014 3:53:16 PM" "7/13/2014 3:24:19 PM"
23 3 4 51
"7/13/2014 11:47:49 AM" "7/12/2014 11:57:41 AM" "7/11/2014 10:01:48 AM" "7/10/2014 4:54:08 PM"
52 53
"7/10/2014 2:23:04 PM" "7/10/2014 11:34:09 AM"
这篇关于R:分离混合数据列,日期多次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!