将200万行日期字符串加速转换为POSIX.ct [英] Speedup conversion of 2 million rows of date strings to POSIX.ct

查看:169
本文介绍了将200万行日期字符串加速转换为POSIX.ct的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个csv,其中包含约200万行日期字符串,其格式为:

I have a csv which includes about 2 million rows of date strings in the format:

2012/11/13 21:10:00 

让我们称之为csv$Date.and.Time

我想将这些日期(及其伴随的数据)尽快转换为xts

我已经编写了一个脚本,可以很好地执行转换(请参见下文),但是它的运行速度非常慢,我想尽可能加快转换速度.

I have written a script which performs the conversion just fine (see below), but it's terribly slow and I'd like to speed this up as much as possible.

这是我目前的方法.有人对如何加快速度有任何建议吗?

Here is my current methodology. Does anyone have any suggestions on how to make this faster?

 dt <- as.POSIXct(csv$Date.and.Time,tz="UTC")

idx <- format(dt,tz=z,usetz=TRUE)

因此,脚本会将这些日期字符串转换为POSIX.ct.然后,它使用format(z是代表我要转换的TZ的变量)进行时区转换.然后,我进行常规的xts调用,以使其与csv中的其余数据一起成为xts系列.

So the script converts these date strings to POSIX.ct. It then does a timezone conversion using format (z is a variable representing the TZ to which I am converting). I then do a regular xts call to make this an xts series with the rest of the data in the csv.

这有效100%.只是非常非常慢.我试过并行运行它(它什么也没做;如果有的话,它会使情况变得更糟). 慢"是什么意思?

This works 100%. It's just very, very slow. I've tried running this in parallel (it doesn't do anything; if anything it makes it worse). What do I mean by 'slow'?

 user    system   elapsed 
155.246  16.430 171.650 

这是在3GhZ,16GB ram 2012 mb pro上.在Win7机器上具有32GB RAM的类似处理器上,我可以得到一半的数据

That's on a 3GhZ, 16GB ram 2012 mb pro. I can get about half that on a similar processor with 32GB RAM on a Win7 Machine

我确定有人有更好的主意-我愿意通过Rcpp等提出建议.但是,理想情况下,该解决方案适用于csv,而不适用于其他方法,例如设置数据库.话虽如此,我要通过任何可以实现最快转换的方法来做到这一点.

I'm sure someone has a better idea - I'm open to suggestions via Rcpp etc. However, ideally the solution works with the csv rather than some other method, like setting up a database. Having said that, I'm up to doing this via whatever method is going to give the fastest conversion.

我将非常感谢任何帮助.预先感谢.

I'd be super appreciative of any help at all. Thanks in advance.

推荐答案

您需要由Simon制作的小型且简单的 fasttime 软件包,以最快的方式-通过不调用时间解析函数,而仅使用C级字符串函数.

You want the small and simple fasttime package by Simon which does this in the fastest possible way---by not calling time parsing functions but just using C-level string functions.

它不支持与strptime一样多的格式.实际上,它甚至没有格式字符串.但是格式良好的ISO格式变体(即yyyy-mm-dd hh:mm:ss.fff)将起作用,并且/分隔符也可能起作用.

It does not support as many formats as strptime. In fact, it doesn't even have a format string. But well-formed ISO format variants, that is yyyy-mm-dd hh:mm:ss.fff will work, and your / separator may just work too.

这篇关于将200万行日期字符串加速转换为POSIX.ct的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆