按日期和时间对r进行排序和排序 [英] Sorting and ranking a dataframe by date and time in r

查看:291
本文介绍了按日期和时间对r进行排序和排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框如下。原来它只是两列/变量 - Timestamp(包含日期和时间)和Actor。我将时间戳变量分解为日期和时间,然后进一步下降到小时和分钟,然后给出以下结构

  dataf< -structure(list(hours = structure(c(3L,4L,4L,3L,3L,3L,6L,
6L,6L,6L,6L, 2L,2L,2L,2L,5L,5L,5L,1L,1L,2L,2L),标号= c(9,
12,14,15 ,L,L,L,L,L,L 13L,13L,
10L,11L,12L,2L,5L,8L,8L),.Label = c(00,04,08,
09,10 ,12,13,18,19,20,21,22,27,39,
51 3,3,3L,3L,2L,2L,2L,2L,2L,1L,1L,1L, 1L,4L,4L,
4L,1L,1L,1L,1L),.Label = c(4/28/2014,5/18/2014,5/2/2014
5/6/2014),class =factor),time = structure(c(7L,8L,9L,
4L,5L,6L,13L,14L,15L,15L, 16L,2L,2L,3L,3L,10L,11L,
12L,17L,18L,1L ,1L),.Label = c(12:18,12:19,12:27,
14:39,14:51,14:52 14:59,15:00,15:04,16:20,
16:21,16:22,17:08,17:09 ,17:12,17:13,9:04,
9:10),class =factor),Timestamp =结构(c(13L,14L,
15L,10L,11L,12L,6L,7L,8L,8L,9L,2L,2L,3L,3L,16L,
17L,18L,4L,5L,1L,1L),标号= c (4/28/2014 12:18,4/28/2014 12:19,
4/28/2014 12:27,4/28/2014 9:04, 4/28/2014 9:10,5/18/2014 17:08,
5/18/2014 17:09,5/18/2014 17:12,5 / 18/2014 17:13,5/2/2014 14:39,
5/2/2014 14:51,5/2/2014 14:52,5/2 / 2014年14:59,5/2/2014 15:00,
5/2/2014 15:04,5/6/2014 16:20,5/6/2014 16 :21,5/6/2014 16:22
),class =factor),Actor = c(7L,7L,7L,7L,7L,7L,5L,5L,
2L,12L,2L,7L,7L,7L,7L,10L,10L,10L,7L,10L,7L,7L)),Names = c(小时,
分钟 ,时间,时间戳,演员),row.names = c(NA,
-22L),class =data.frame)

将时间戳和时间变量分解成单独的变量的原因是因为在我的实际数据中,我遇到了很多按数据排序和/或时间排序的问题。将这些变量分解成较小的块使得排序更容易。



现在我想做的是创建一个名为Rank的新变量,它将返回数据帧中最早的事件为'1'(这将是2014年4月28日上午9时4分的观察),然后是在日期/时间顺序下一次观察的2等。



排序数据框似乎相对微不足道:

  dataf <-dataf [order as.Date(dataf $ date,format =%m /%d /%Y),dataf $ hours,dataf $ mins),] 

这样做。但是我正在努力的是现在分配等级。



我尝试过这个,因为我使用'ave'与FUN = rank组合排列整数,但是它产生的是可笑的错误:

  dataf $ rank < -  ave((dataf [order(as.Date(dataf $ date ,format =%m /%d /%Y),dataf $ hours,dataf $ mins),]),FUN = rank)

任何帮助赞赏

解决方案

我不会分享你对datetime对象的厌恶,这很简单:

  dataf $ ts<  -  strptime(as.character(dataf $ Timestamp),'%m / %d /%Y%H:%M')
dataf < - dataf [order(dataf $ ts),]
dataf $ ts_rank< - rank(dataf $ ts,ties.method = min)
dataf
##小时最小日期时间Timestamp Actor ts ts_rank
## 19 9 04 4/28/2014 9:04 4/28/2014 9:04 7 2014 -04-28 09:04:00 1
## 20 9 10 4/28/2014 9:10 4/28/2014 9:10 10 2014-04-28 09 :10:00 2
## 21 12 18 4/28/2014 12:18 4/28/2014 12:18 7 2014-04-28 12:18:00 3
## 22 12 18 4/28/2014 12:18 4/28/2014 12:18 7 2014-04-28 12:18:00 3
## 12 12 19 4/28/2014 12:19 4/28 / 2014 12:19 7 2014-04-28 12:19:00 5
## 13 12 19 4/28/2014 12:19 4/28/2014 12:19 7 2014-04-28 12:19 :00 5
## 14 12 27 4/28/2014 12:27 4/28/2014 12:27 7 2014-04-28 12:27:00 7
## 15 12 27 4 / 28/2014 12:27 4/28/2014 12:27 7 2014-04-28 12:27:00 7
## 4 14 39 5/2/2014 14:39 5/2/2014 14 :39 7 2014-05-02 14:39:00 9
#5 14 51 5/2/2014 14:51 5/2/2014 14:51 7 2014-05-02 14:51:00 10
## 6 14 52 5/2/2014 14:52 5/2/2014 14:52 7 2014-05-02 14:52:00 11
## 1 14 59 5/2 / 2014 14:59 5/2/2014 14:59 7 2014-05-02 14:59:00 12
## 2 15 00 5/2/2014 15:00 5/2/2014 15:00 7 2014-05-02 15:00:00 13
## 3 15 04 5/2/2014 15:04 5/2/2014 15:04 7 2014-05-02 15:04:00 14
## 16 16 20 5/6/2014 16:20 5/6 / 2014 16:20 10 2014-05-06 16:20:00 15
## 17 16 21 5/6/2014 16:21 5/6/2014 16:21 10 2014-05-06 16:21 :00 16
## 18 16 22 5/6/2014 16:22 5/6/2014 16:22 10 2014-05-06 16:22:00 17
## 7 17 08 5 / 18/2014 17:08 5/18/2014 17:08 5 2014-05-18 17:08:00 18
## 8 17 09 5/18/2014 17:09 5/18/2014 17 :09 5 2014-05-18 17:09:00 19
## 9 17 12 5/18/2014 17:12 5/18/2014 17:12 2 2014-05-18 17:12:00 20
## 10 17 12 5/18/2014 17:12 5/18/2014 17:12 12 2014-05-18 17:12:00 20
## 11 17 13 5/18 / 2014 17:13 5/18/2014 17:13 2 2014-05-18 17:13:00 22


I have a dataframe as below. Originally it was just two columns/variables -"Timestamp" (which contains date and time) and "Actor". I broke down the "Timestamp" variable into "date" and "time" and then "time further down into "hours" and "mins". This then gives the following structure

dataf<-structure(list(hours = structure(c(3L, 4L, 4L, 3L, 3L, 3L, 6L, 
6L, 6L, 6L, 6L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 1L, 1L, 2L, 2L), .Label = c("9", 
"12", "14", "15", "16", "17"), class = "factor"), mins = structure(c(17L, 
1L, 2L, 14L, 15L, 16L, 3L, 4L, 6L, 6L, 7L, 9L, 9L, 13L, 13L, 
10L, 11L, 12L, 2L, 5L, 8L, 8L), .Label = c("00", "04", "08", 
"09", "10", "12", "13", "18", "19", "20", "21", "22", "27", "39", 
"51", "52", "59"), class = "factor"), date = structure(c(3L, 
3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 4L, 4L, 
4L, 1L, 1L, 1L, 1L), .Label = c("4/28/2014", "5/18/2014", "5/2/2014", 
"5/6/2014"), class = "factor"), time = structure(c(7L, 8L, 9L, 
4L, 5L, 6L, 13L, 14L, 15L, 15L, 16L, 2L, 2L, 3L, 3L, 10L, 11L, 
12L, 17L, 18L, 1L, 1L), .Label = c("12:18", "12:19", "12:27", 
"14:39", "14:51", "14:52", "14:59", "15:00", "15:04", "16:20", 
"16:21", "16:22", "17:08", "17:09", "17:12", "17:13", "9:04", 
"9:10"), class = "factor"), Timestamp = structure(c(13L, 14L, 
15L, 10L, 11L, 12L, 6L, 7L, 8L, 8L, 9L, 2L, 2L, 3L, 3L, 16L, 
17L, 18L, 4L, 5L, 1L, 1L), .Label = c("4/28/2014 12:18", "4/28/2014 12:19", 
"4/28/2014 12:27", "4/28/2014 9:04", "4/28/2014 9:10", "5/18/2014 17:08", 
"5/18/2014 17:09", "5/18/2014 17:12", "5/18/2014 17:13", "5/2/2014 14:39", 
"5/2/2014 14:51", "5/2/2014 14:52", "5/2/2014 14:59", "5/2/2014 15:00", 
"5/2/2014 15:04", "5/6/2014 16:20", "5/6/2014 16:21", "5/6/2014 16:22"
), class = "factor"), Actor = c(7L, 7L, 7L, 7L, 7L, 7L, 5L, 5L, 
2L, 12L, 2L, 7L, 7L, 7L, 7L, 10L, 10L, 10L, 7L, 10L, 7L, 7L)), .Names = c("hours", 
"mins", "date", "time", "Timestamp", "Actor"), row.names = c(NA, 
-22L), class = "data.frame")    

The reason for breaking the timestamp and time variables down into separate variables was because in my real data I have had a lot of problems sorting by data and/or time. Breaking these variables down into smaller chunks has made it much easier to sort.

What I would like to do now is create a new variable called "Rank", which would return a '1' for the earliest event in the dataframe (which would be the observation at 9.04am on the 28th April 2014), then a '2' for the next observation in date/time order and so on.

Sorting the dataframe appears to be relatively trivial:

dataf<-dataf[order(as.Date(dataf$date, format="%m/%d/%Y"), dataf$hours, dataf$mins),]

This does the job. But what I am struggling with is now to assign ranks.

I tried this, because I have used 'ave' in combination with FUN=rank to rank integers, but what it produces is laughably wrong:

dataf$rank <- ave((dataf[order(as.Date(dataf$date, format="%m/%d/%Y"), dataf$hours, dataf$mins),]),FUN=rank )

any help appreciated

解决方案

I do not share your aversion to datetime objects, which makes this all much simpler:

dataf$ts <- strptime(as.character(dataf$Timestamp),'%m/%d/%Y %H:%M')
dataf <- dataf[order(dataf$ts),]
dataf$ts_rank <- rank(dataf$ts,ties.method = "min")
dataf
##    hours mins      date  time       Timestamp Actor                  ts ts_rank
## 19     9   04 4/28/2014  9:04  4/28/2014 9:04     7 2014-04-28 09:04:00       1
## 20     9   10 4/28/2014  9:10  4/28/2014 9:10    10 2014-04-28 09:10:00       2
## 21    12   18 4/28/2014 12:18 4/28/2014 12:18     7 2014-04-28 12:18:00       3
## 22    12   18 4/28/2014 12:18 4/28/2014 12:18     7 2014-04-28 12:18:00       3
## 12    12   19 4/28/2014 12:19 4/28/2014 12:19     7 2014-04-28 12:19:00       5
## 13    12   19 4/28/2014 12:19 4/28/2014 12:19     7 2014-04-28 12:19:00       5
## 14    12   27 4/28/2014 12:27 4/28/2014 12:27     7 2014-04-28 12:27:00       7
## 15    12   27 4/28/2014 12:27 4/28/2014 12:27     7 2014-04-28 12:27:00       7
## 4     14   39  5/2/2014 14:39  5/2/2014 14:39     7 2014-05-02 14:39:00       9
## 5     14   51  5/2/2014 14:51  5/2/2014 14:51     7 2014-05-02 14:51:00      10
## 6     14   52  5/2/2014 14:52  5/2/2014 14:52     7 2014-05-02 14:52:00      11
## 1     14   59  5/2/2014 14:59  5/2/2014 14:59     7 2014-05-02 14:59:00      12
## 2     15   00  5/2/2014 15:00  5/2/2014 15:00     7 2014-05-02 15:00:00      13
## 3     15   04  5/2/2014 15:04  5/2/2014 15:04     7 2014-05-02 15:04:00      14
## 16    16   20  5/6/2014 16:20  5/6/2014 16:20    10 2014-05-06 16:20:00      15
## 17    16   21  5/6/2014 16:21  5/6/2014 16:21    10 2014-05-06 16:21:00      16
## 18    16   22  5/6/2014 16:22  5/6/2014 16:22    10 2014-05-06 16:22:00      17
## 7     17   08 5/18/2014 17:08 5/18/2014 17:08     5 2014-05-18 17:08:00      18
## 8     17   09 5/18/2014 17:09 5/18/2014 17:09     5 2014-05-18 17:09:00      19
## 9     17   12 5/18/2014 17:12 5/18/2014 17:12     2 2014-05-18 17:12:00      20
## 10    17   12 5/18/2014 17:12 5/18/2014 17:12    12 2014-05-18 17:12:00      20
## 11    17   13 5/18/2014 17:13 5/18/2014 17:13     2 2014-05-18 17:13:00      22

这篇关于按日期和时间对r进行排序和排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆