将 200 万行日期字符串加速转换为 POSIX.ct [英] Speedup conversion of 2 million rows of date strings to POSIX.ct

查看:27
本文介绍了将 200 万行日期字符串加速转换为 POSIX.ct的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含大约 200 万行日期字符串的 csv,格式如下:

I have a csv which includes about 2 million rows of date strings in the format:

2012/11/13 21:10:00 

让我们称之为 csv$Date.and.Time

我想尽快将这些日期(及其随附数据)转换为 xts

我已经编写了一个脚本,可以很好地执行转换(见下文),但它非常慢,我想尽可能加快速度.

I have written a script which performs the conversion just fine (see below), but it's terribly slow and I'd like to speed this up as much as possible.

这是我目前的方法.有没有人对如何加快速度有任何建议?

Here is my current methodology. Does anyone have any suggestions on how to make this faster?

 dt <- as.POSIXct(csv$Date.and.Time,tz="UTC")

idx <- format(dt,tz=z,usetz=TRUE)

因此脚本将这些日期字符串转换为 POSIX.ct.然后它使用 format 进行时区转换(z 是一个变量,代表我要转换的 TZ).然后,我进行常规 xts 调用,将其与 csv 中的其余数据一起制作为 xts 系列.

So the script converts these date strings to POSIX.ct. It then does a timezone conversion using format (z is a variable representing the TZ to which I am converting). I then do a regular xts call to make this an xts series with the rest of the data in the csv.

这 100% 有效.它只是非常非常缓慢.我试过并行运行它(它什么也没做;如果有的话,它会使情况变得更糟).慢"是什么意思?

This works 100%. It's just very, very slow. I've tried running this in parallel (it doesn't do anything; if anything it makes it worse). What do I mean by 'slow'?

 user    system   elapsed 
155.246  16.430 171.650 

这是在 3GhZ、16GB ram 2012 mb pro 上.我可以在 Win7 机器上使用具有 32GB RAM 的类似处理器获得大约一半

That's on a 3GhZ, 16GB ram 2012 mb pro. I can get about half that on a similar processor with 32GB RAM on a Win7 Machine

我相信有人有更好的主意 - 我愿意通过 Rcpp 等提出建议.但是,理想情况下,该解决方案适用于 csv 而不是其他方法,例如设置建立一个数据库.话虽如此,我还是会通过任何能实现最快转换的方法来做到这一点.

I'm sure someone has a better idea - I'm open to suggestions via Rcpp etc. However, ideally the solution works with the csv rather than some other method, like setting up a database. Having said that, I'm up to doing this via whatever method is going to give the fastest conversion.

我非常感谢任何帮助.提前致谢.

I'd be super appreciative of any help at all. Thanks in advance.

推荐答案

你想要 Simon 的小而简单的 fasttime 包,它可以做到这一点以最快的方式——不调用时间解析函数,而只是使用 C 级字符串函数.

You want the small and simple fasttime package by Simon which does this in the fastest possible way---by not calling time parsing functions but just using C-level string functions.

它不支持像 strptime 那么多的格式.事实上,它甚至没有格式字符串.但是格式良好的 ISO 格式变体,即 yyyy-mm-dd hh:mm:ss.fff 将起作用,并且您的 / 分隔符也可能起作用.

It does not support as many formats as strptime. In fact, it doesn't even have a format string. But well-formed ISO format variants, that is yyyy-mm-dd hh:mm:ss.fff will work, and your / separator may just work too.

这篇关于将 200 万行日期字符串加速转换为 POSIX.ct的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆