用R中的有序数字代替时代以来的秒数 [英] Replacing seconds since the epoch with ordered digits in R
问题描述
我有一个数据集,时间戳记是从时代以秒开始的:
id事件时间
$ p $但是,由于处理大量非常长的时间戳,因此我使用的算法(最佳匹配距离)会产生严重的性能问题,我想将其减少到首先(或同时)发生哪个事件的简单排序。我的意思是数据集中最早的事件(行)应该是1,然后是2,3,4等。如果两行的数目完全相同(从时代开始的秒),它们需要被赋予相同的数字在新的,缩小的格式。因此,这将需要输出以下一行:
2 722打开1356931342
1 723打开1356963741
4 721参考1356988186
5 721关闭1356988186
3 721参考1356988206
id事件时间
2 722打开1
1 723打开2
4 721参考3
5 721关闭3
3 721参考4
时间列本质上是一个数字的向量(不是因素 - 因为我试图解决一个性能问题,这不行)。
我可以使用以下命令订购数据框:
df< - df [with(df,order )),]
但是,如何用有序的单位数替换数字(相同的数字等于时间戳)?
解决方案我会使用
match
和unique
以下列方式创建一个整数
向量,除非您有特定的原因要求您的时间列为factor
变量...d f $ newtime< - match(df $ time,unique(df $ time))
#id事件时间newtime
#2 722打开1356931342 1
#1 723打开1356963741 2
#4 721参考1356988186 3
#5 721关闭1356988186 3
#3 721参考1356988206 4
因子
的代码使用match
和unique
无论如何。I have a dataset where the timestamp is in seconds since the epoch:
id event time 2 722 opened 1356931342 1 723 opened 1356963741 4 721 referenced 1356988186 5 721 closed 1356988186 3 721 referenced 1356988206
However, because processing a large number of very long time stamps creates serious performance issues with the algorithm that I'm using (optimal matching distances), I want to reduce this to a simple ordering of which event came first (or at the same time). By this I mean is that the earliest event (row) in the dataset should be 1, then 2, 3, 4, etc. If two rows have exactly the same number (seconds since the epoch), they need to be given the same number in the new, reduced format. Hence, this would need to output something along the lines of:
id event time 2 722 opened 1 1 723 opened 2 4 721 referenced 3 5 721 closed 3 3 721 referenced 4
Where the "time" column is essentially a vector of numbers (not factors - this will not work since I'm trying to solve a performance issue).
I can order the dataframe using:
df <- df[with(df, order(time)), ]
However, how do I replace the numbers with ordered single digits (same number for equal time stamps)?
解决方案I'd use
match
andunique
to create aninteger
vector in the following manner unless you have a specific reason to require your time column as afactor
variable...df$newtime <- match( df$time , unique( df$time ) ) # id event time newtime #2 722 opened 1356931342 1 #1 723 opened 1356963741 2 #4 721 referenced 1356988186 3 #5 721 closed 1356988186 3 #3 721 referenced 1356988206 4
The code for
factor
usesmatch
andunique
anyway.这篇关于用R中的有序数字代替时代以来的秒数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!