导入数据期间解析日期格式的最佳方法 [英] Best method for parsing date formats during import datas

查看:110
本文介绍了导入数据期间解析日期格式的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一种方法,用于在数据导入(400 K条记录)期间解析视图的不同日期格式.我的方法捕获ParseException并尝试在日期不同时用下一种格式解析日期.

I created method for parsing a view different date formats during data import (400 K records). My method catches ParseException and trying to parse date with next format when it's different.

问题:是在数据导入期间设置正确的日期格式的更好的方法(更快)吗?

Question: Is better way(and faster) to set correct date format during data import?

private static final String DMY_DASH_FORMAT = "dd-MM-yyyy";
private static final String DMY_DOT_FORMAT = "dd.MM.yyyy";
private static final String YMD_DASH_FORMAT = "yyyy-MM-dd";
private static final String YMD_DOT_FORMAT = "yyyy.MM.dd";
private static final String SIMPLE_YEAR_FORMAT = "yyyy";
private final List<String> dateFormats = Arrays.asList(YMD_DASH_FORMAT, DMY_DASH_FORMAT,
        DMY_DOT_FORMAT, YMD_DOT_FORMAT);

private Date parseDateFromString(String date) throws ParseException {
    if (date.equals("0")) {
        return null;
    }
    if (date.length() == 4) {
        SimpleDateFormat simpleDF = new SimpleDateFormat(SIMPLE_YEAR_FORMAT);
        simpleDF.setLenient(false);
        return new Date(simpleDF.parse(date).getTime());
    }
    for (String format : dateFormats) {
        SimpleDateFormat simpleDF = new SimpleDateFormat(format);
        try {
            return new Date(simpleDF.parse(date).getTime());
        } catch (ParseException exception) {
        }
    }
    throw new ParseException("Unknown date format", 0);
} 

推荐答案

讨论大约40万条记录,在这里进行一些空手"优化可能是合理的.

Talking about 400K records, it might be reasonable to do some "bare hands" optimization here.

例如:如果您输入的字符串在位置5处带有-",则您知道(可能)匹配的格式为"yyyy-MM-dd".如果是 ".";您知道这是从yyyy开始的另一种格式.

For example: if your incoming string has a "-" on position 5, then you know that the only (potentially) matching format would be "yyyy-MM-dd". If it is "."; you know that it is the other format that starts yyyy.

因此,如果您确实要优化,则可以获取该字符并查看其含义.可以节省3次尝试使用错误格式的解析!

So, if you really want to optimize, you could fetch that character and see what it is. Could save 3 attempts of parsing with the wrong format!

除此之外:我不确定是否可以确定"dd"是否表示您的其他日期以"01"开头……还是"1.1.2016"也可以.如果您所有的日期都始终使用两位数字作为dd/mm;那么您可以重复该游戏-就像您在位置3上获取的那样-在"dd ...."和"dd -...."之间进行选择.

Beyond that: I am not sure if sure if "dd" means that your other dates start with "01" ... or if "1.1.2016" would be possible, too. If all your dates always use two digits for dd/mm; then you can repeat that game - as you would fetch on position 3 - to choose between "dd...." and "dd-....".

当然;有一个缺点-如果您遵循该想法,就会将预期格式非常硬编码"到您的代码中;因此添加其他格式将变得更加困难.另一方面;您会节省很多.

Of course; there is one disadvantage - if you follow that idea, you are very much "hard-coding" the expected formats into your code; so adding other formats will become harder. On the other hand; you would save a lot.

最后:可能会大大加快速度的另一件事是使用流操作来读取/解析该信息;因为您可以查看 parallel 流,并简单地利用现代硬件并行处理4、8、16日期的功能.

Finally: the other thing that might greatly speed up things would be to use stream operations for reading/parsing that information; because then you could look into parallel streams, and simply exploit the ability of modern hardware to process 4, 8, 16 dates in parallel.

这篇关于导入数据期间解析日期格式的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆