转换时间及日期到相对时间(CSV处理) [英] Converting Time & Date to relative time (CSV processing)

查看:193
本文介绍了转换时间及日期到相对时间(CSV处理)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正处于编写多方面投资算法的初期.我目前正在研究的部分是关于使用带有LASSO罚分的图形高斯模型来查找相互依存关系,该相互依存关系可用于为投资策略提供信息.我目前正在尝试使用JAVA预处理历史CSV数据输入,并使用相关数据创建一个新的CSV输出文件.

I am currently in the early stages of writing a multi-faceted investment algorithm. The part I am currently working on is concerned with using a Graphical Gaussian Model with a LASSO penalty to find inter-dependencies which can be used to inform investment strategy. I am currently trying to use JAVA to pre-process historical CSV data input and create a new CSV output file with the relevant data.

我用于测试处理算法的原始,小规模示例数据(最终将在Reuters Eikon动态提要中使用)为txt/CSV格式.我有一个包含文本文件的文件夹,该文件包含有关纽约证券交易所许多股票的历史数据.尽管共有8列,但我感兴趣的三列(用于创建协方差矩阵(将其输入到"GLASSO"之前进行预处理)是日期,时间和时间.开盘价. 开盘价"列不需要进行任何预处理,因此可以输入到噪音较小的新输出文件中.

The raw, small-scale example data I am using to test the processing algorithm (which will eventually be used on a Reuters Eikon live feed) is in txt/CSV format. I have a folder containing text files with historical data on many stocks on the NYSE. Although there are 8 columns, the three I am interested in (for the purposes of pre-processing before creating a covariance matrix which will feed into 'GLASSO') are the Date, Time & Opening prices. The opening prices column requires no pre-processing, so that can be fed into a new, less noisy output file.

我的问题是如何将两列(日期和时间)转换为单个时间度量.我当时想,最明显的方法是在数据中找到最早的时间点,并将其用作点0(以秒为单位).然后,我需要将每个时间和日期组合转换为一列,以显示它比输出CSV文件中的原始时间点晚了几秒钟.完成此操作而不是指定文件路径后,我希望能够指定一个文件夹,然后程序在所有文本文件中循环查找相关列并将其全部输出到单个CSV文件中.

My issue is how to convert the two columns (date and time) into a single time measurement. I was thinking the most obvious way to do this would be to find the earliest point in time in my data and use this as point 0 (in seconds). I would then need to convert every time and date combination into a single column showing how many seconds it is past the original time point in the output CSV file. Once this was done rather than a file path specification I would like to be able to specify a folder and the program loop through all text files finding the relevant columns and output all into a single CSV file.

在实践中希望如此:

CSV标题和一个NYSE txt文件中的第一个条目-

CSV title and first entry in one NYSE txt file -

日期,时间,打开,高,低,关闭,音量,OpenInt

"Date,Time,Open,High,Low,Close,Volume,OpenInt

2016-02-03,15:35:00,37.27,37.36,37.17,37.29,25274,0"

2016-02-03,15:35:00,37.27,37.36,37.17,37.29,25274,0"

从本质上讲,如果第一个条目是最早的时间参考:

So essentially if the first entry is the earliest time reference:

2016-02-03,15:35:00 ='0'

2016-02-03,15:35:00 = '0'

2016-02-03,15:40:00 ='300'(5分钟为300秒)

2016-02-03,15:40:00 = '300' (5 minutes is 300 seconds)

只需重申一下,输入是一个包含数百个以下格式的CSV的文件夹:

Just to re-iterate, input is a folder containing hundreds of the following formatted CSVs:

列- 1:日期 2:时间 3:开放 4:高 5:低 6:关闭 7:音量 8:OpenInt

Columns - 1: Date 2: Time 3: Open 4: High 5: Low 6: Close 7: Volume 8: OpenInt

输出是单个CSV文件,其中包含:

Output is a single CSV file containing:

列- 1:时间度量(距最早入口点的距离,以秒为单位) 2:每次计量输入的股票价格.

Columns - 1: Time measure (distance in seconds from earliest entry point) 2: Stock price for each time measure entry.

请让我知道您是否有关于如何执行此操作的任何线索,不要犹豫,让我知道是否有什么可以澄清的内容可以使您的生活更轻松,我知道我可以对此进行解释以较少混乱的方式.

Please let me know if you have any clues about how I could go about doing this, don't hesitate to let me know if there is anything I can clarify to make your lives easier, I realise I could have maybe explained this in a less convoluted manner.

推荐答案

java.time

救世主的答案看起来是正确的.但是它使用了Java 8及更高版本中内置的java.time框架所取代的旧日期时间类.

java.time

The Answer by Saviour Self looks correct. But it uses the old date-time classes that have been supplanted by the java.time framework built into Java 8 and later.

作为奖励,我展示了如何使用 Apache Commons CSV 库处理读取/写入CSV文件的琐事.

As a bonus, I show how to use the Apache Commons CSV library to handle the chore of reading/writing CSV files.

首先,我们通过创建来模拟CSV文件. StringReader .

First we simulate a CSV file by making a StringReader.

RFC 4180 规范正式定义了CSV格式.在这方面也存在变化.

The RFC 4180 spec defines the CSV format formally. Variations on this also exist.

RFC 4180需要回车 + 换行符(CRLF)作为换行符(换行符).最后一行的终止符是可选的,我们在此处将其包括在内.

RFC 4180 requires Carriage Return + Line Feed (CRLF) as the newline (line terminator). The last line’s terminator is optional which we include here.

我们省略了可选的标题行(列标题).

We omit the optional header line (column titles).

String newline = "\r\n";
StringBuilder input = new StringBuilder ();
input.append ( "2016-02-03,15:10:00,37" ).append ( newline );
input.append ( "2016-02-03,15:15:00,38" ).append ( newline );  // 5 minutes later.
input.append ( "2016-02-03,15:17:00,39" ).append ( newline );  // 2 minutes later.

Reader in = new StringReader ( input.toString () );

接下来,我们将整个CSV文件读入内存,Common CSV库在其中创建

Next we read in the entire CSV file into memory, where the Commons CSV library creates CSVRecord objects to represent each row of incoming data. One line of code does all that work, with CSVFormat::parse producing a CSVParser object (an implementation of Interable).

Iterable<CSVRecord> records;
try {
    records = CSVFormat.DEFAULT.parse ( in );  // 'records' is a CSVParser.
} catch ( IOException ex ) {
    // FIXME: Handle exception.
    System.out.println ( "[ERROR] " + ex );
    return; // Bail-out.
}

现在,我们分析 CSVRecord 对象.记住第一个作为我们的基准,存储在 Instant (下面讨论).然后循环比较每个连续的CSVRecord对象,将每个字段检查为String.

Now we analyze that collection of CSVRecord objects. Remember the first one as our baseline, stored here as an Instant (discussed below). Then loop to compare each successive CSVRecord object, examining each field as a String.

Instant firstInstant = null; // Track the baseline against which we calculate the increasing time
for ( CSVRecord record : records ) {
    String dateInput = record.get ( 0 );  // Zero-based index.
    String timeInput = record.get ( 1 );
    String priceInput = record.get ( 2 );
    //System.out.println ( dateInput + " | " + timeInput + " | " + priceInput );  // Dump input strings for debugging.

提取仅用于日期和仅用于时间的字符串,并合并为 LocalDateTime .

Extract the strings for date-only and time-only, combine into a LocalDateTime.

    // Parse strings.
    LocalDate date = LocalDate.parse ( dateInput );
    LocalTime time = LocalTime.parse ( timeInput );
    Integer price = Integer.parseInt ( priceInput );
    // Combine date and time.
    LocalDateTime ldt = LocalDateTime.of ( date , time );  // Not a specific moment on the timeline.

此日期时间对象不是时间轴上的一个点,因为我们不知道其 LocalDateTime 对象,您将假定是24小时通用的一天,没有出现夏令时(DST)等异常情况.如果您的数据在任何异常情况下都不会发生,那么您可能会避免这种情况,但这是一个坏习惯.如果知道,最好分配一个时区.

This date-time object is not a point on the timeline as we do not know its offset-from-UTC or time zone. If you were to use these values to calculate a delta between LocalDateTime objects, you would be assuming generic 24-hour days free of anomalies such as Daylight Saving Time (DST). You might get away with this if your data happens to not occur during any anomaly, but it is a bad habit. Better to assign a time zone if known.

我们知道数据的来源,因此我们可以假设预期的时区为

We know the source of the data, so we can assume the intended time zone, a ZoneId. By assigning that assumed time zone, we get a real moment on the timeline.

    // Generally best to assign the time zone known to apply to this incoming data.
    ZoneId zoneId = ZoneId.of ( "America/New_York" );  // Move this line somewhere else to eliminate needless repetition.
    ZonedDateTime zdt = ldt.atZone ( zoneId );  // Now this becomes a specific moment on the timeline.

从该 ZonedDateTime 我们可以在UTC中提取同一时刻(Instant).通常,您可以使用 Instant 应该用于数据存储,数据交换,序列化等等.您只需要ZonedDateTime即可在用户的预期时区向用户演示.

From that ZonedDateTime we can extract the same moment in UTC (an Instant). Generally the Instant is what you should be using for data storage, data exchange, serialization, and so on. You only need the ZonedDateTime for presentation to the user in their expected time zone.

    Instant instant = zdt.toInstant ();  // Use Instant (moment on the timeline in UTC) for data storage, exchange, serialization, database, etc.
    if ( null == firstInstant ) {
        firstInstant = instant;  // Capture the first instant.
    }

目标是将每个CSVRecord与原始基准日期时间进行比较.

The goal is to compare each CSVRecord to the original baseline date-time. The Duration.between method does just that.

    Duration duration = Duration.between ( firstInstant , instant );

我们以总秒数计算增量.

We calculate the delta in total seconds.

    Long deltaInSeconds = duration.getSeconds ();

将这些结果写入输出CSV文件作为练习供读者阅读. Apache Commons CSV库可以简化工作,因为它可以写入和读取CSV格式.

Writing these results to an output CSV file is left as an exercise for you the reader. The Apache Commons CSV library makes short work of that is it writes as well as reads CSV formats.

    // … output the deltaInSeconds & price to CSV. Apache Commons CSV can write as well as read CSV files.
    System.out.println ( "deltaInSeconds: " + deltaInSeconds + " | price: " + price );

}

运行时.

deltaInSeconds: 0 | price: 37
deltaInSeconds: 300 | price: 38
deltaInSeconds: 420 | price: 39

这篇关于转换时间及日期到相对时间(CSV处理)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆