用R中的有序数字代替时代以来的秒数 [英] Replacing seconds since the epoch with ordered digits in R

查看:132
本文介绍了用R中的有序数字代替时代以来的秒数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,时间戳记是从时代以秒开始的:

  id事件时间
2 722打开1356931342
1 723打开1356963741
4 721参考1356988186
5 721关闭1356988186
3 721参考1356988206
  id事件时间
2 722打开1
1 723打开2
4 721参考3
5 721关闭3
3 721参考4

时间列本质上是一个数字的向量(不是因素 - 因为我试图解决一个性能问题,这不行)。



我可以使用以下命令订购数据框:

  df<  -  df [with(df,order )),] 

但是,如何用有序的单位数替换数字(相同的数字等于时间戳)?

解决方案

我会使用 match unique 以下列方式创建一个整数向量,除非您有特定的原因要求您的时间列为 factor 变量...

  d f $ newtime<  -  match(df $ time,unique(df $ time))
#id事件时间newtime
#2 722打开1356931342 1
#1 723打开1356963741 2
#4 721参考1356988186 3
#5 721关闭1356988186 3
#3 721参考1356988206 4

因子的代码使用 match unique 无论如何。


I have a dataset where the timestamp is in seconds since the epoch:

   id      event       time       
2 722     opened 1356931342
1 723     opened 1356963741
4 721 referenced 1356988186
5 721     closed 1356988186
3 721 referenced 1356988206

However, because processing a large number of very long time stamps creates serious performance issues with the algorithm that I'm using (optimal matching distances), I want to reduce this to a simple ordering of which event came first (or at the same time). By this I mean is that the earliest event (row) in the dataset should be 1, then 2, 3, 4, etc. If two rows have exactly the same number (seconds since the epoch), they need to be given the same number in the new, reduced format. Hence, this would need to output something along the lines of:

   id      event       time       
2 722     opened       1
1 723     opened       2
4 721 referenced       3
5 721     closed       3
3 721 referenced       4

Where the "time" column is essentially a vector of numbers (not factors - this will not work since I'm trying to solve a performance issue).

I can order the dataframe using:

df <- df[with(df, order(time)), ]

However, how do I replace the numbers with ordered single digits (same number for equal time stamps)?

解决方案

I'd use match and unique to create an integer vector in the following manner unless you have a specific reason to require your time column as a factor variable...

df$newtime <- match( df$time , unique( df$time ) )
#   id      event       time newtime
#2 722     opened 1356931342       1
#1 723     opened 1356963741       2
#4 721 referenced 1356988186       3
#5 721     closed 1356988186       3
#3 721 referenced 1356988206       4

The code for factor uses match and unique anyway.

这篇关于用R中的有序数字代替时代以来的秒数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆