使用新的java.time API解析时区的速度非常慢 [英] Extremely slow parsing of time zone with the new java.time API

查看:172
本文介绍了使用新的java.time API解析时区的速度非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是将模块从旧的java日期迁移到新的java.time API,并注意到性能大幅下降。归结为使用时区解析日期(我一次解析数百万个)。

I was just migrating a module from the old java dates to the new java.time API, and noticed a huge drop in performance. It boiled down to parsing of dates with timezone (I parse millions of them at a time).

解析没有时区的日期字符串( yyyy / MM / dd HH:mm:ss )速度快 - 比旧的Java日期快2倍,在我的电脑上每秒约1.5M操作。

Parsing of date string without a time zone (yyyy/MM/dd HH:mm:ss) is fast - about 2 times faster than with the old java date, about 1.5M operations per second on my PC.

但是,当模式包含时区时( yyyy / MM / dd HH:mm:ss z )使用新的 java.time API,性能下降了大约15倍,而使用旧的API时,它的速度与没有时区的速度一样快。请参阅下面的性能基准。

However, when the pattern contains a time zone (yyyy/MM/dd HH:mm:ss z), the performance drops about 15 times with the new java.time API, while with the old API it is about as fast as without a time zone. See the performance benchmark below.

有没有人知道我是否可以使用新的 java.time API?目前,作为一种解决方法,我使用旧的API进行解析,然后将 Date 转换为Instant,这不是特别好。

Does anyone have an idea if I can somehow parse these strings quickly using the new java.time API? At the moment, as a workaround, I am using the old API for parsing and then convert the Date to Instant, which is not particularly nice.

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.time.format.DateTimeFormatter;
import java.time.format.DateTimeFormatterBuilder;
import java.util.concurrent.TimeUnit;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OperationsPerInvocation;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(1)
@Fork(1)
@Warmup(iterations = 3)
@Measurement(iterations = 5)
@State(Scope.Thread)
public class DateParsingBenchmark {

    private final int iterations = 100000;

    @Benchmark
    public void oldFormat_noZone(Blackhole bh, DateParsingBenchmark st) throws ParseException {

        SimpleDateFormat simpleDateFormat = 
                new SimpleDateFormat("yyyy/MM/dd HH:mm:ss");

        for(int i=0; i<iterations; i++) {
            bh.consume(simpleDateFormat.parse("2000/12/12 12:12:12"));
        }
    }

    @Benchmark
    public void oldFormat_withZone(Blackhole bh, DateParsingBenchmark st) throws ParseException {

        SimpleDateFormat simpleDateFormat = 
                new SimpleDateFormat("yyyy/MM/dd HH:mm:ss z");

        for(int i=0; i<iterations; i++) {
            bh.consume(simpleDateFormat.parse("2000/12/12 12:12:12 CET"));
        }
    }

    @Benchmark
    public void newFormat_noZone(Blackhole bh, DateParsingBenchmark st) {

        DateTimeFormatter dateTimeFormatter = new DateTimeFormatterBuilder()
                .appendPattern("yyyy/MM/dd HH:mm:ss").toFormatter();

        for(int i=0; i<iterations; i++) {
            bh.consume(dateTimeFormatter.parse("2000/12/12 12:12:12"));
        }
    }

    @Benchmark
    public void newFormat_withZone(Blackhole bh, DateParsingBenchmark st) {

        DateTimeFormatter dateTimeFormatter = new DateTimeFormatterBuilder()
                .appendPattern("yyyy/MM/dd HH:mm:ss z").toFormatter();

        for(int i=0; i<iterations; i++) {
            bh.consume(dateTimeFormatter.parse("2000/12/12 12:12:12 CET"));
        }
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder().include(DateParsingBenchmark.class.getSimpleName()).build();
        new Runner(opt).run();    
    }
}

以及100K操作的结果:

And the results for 100K operations:

Benchmark                                Mode  Cnt     Score     Error  Units
DateParsingBenchmark.newFormat_noZone    avgt    5    61.165 ±  11.173  ms/op
DateParsingBenchmark.newFormat_withZone  avgt    5  1662.370 ± 191.013  ms/op
DateParsingBenchmark.oldFormat_noZone    avgt    5    93.317 ±  29.307  ms/op
DateParsingBenchmark.oldFormat_withZone  avgt    5   107.247 ±  24.322  ms/op

更新:

我刚刚对java.time类进行了一些分析实际上,时区解析器的实现效率非常低。解析一个独立的时区只会造成所有的缓慢。

I just did some profiling of the java.time classes, and indeed, the time zone parser seems to be implemented quite inefficiently. Just parsing a standalone timezone is responsible for all the slowness.

@Benchmark
public void newFormat_zoneOnly(Blackhole bh, DateParsingBenchmark st) {

    DateTimeFormatter dateTimeFormatter = new DateTimeFormatterBuilder()
            .appendPattern("z").toFormatter();

    for(int i=0; i<iterations; i++) {
        bh.consume(dateTimeFormatter.parse("CET"));
    }
}

有一个名为的班级 java.time 包中的ZoneTextPrinterParser ,它在内部制作每个中所有可用时区的集合的副本parse()调用(通过 ZoneRulesProvider.getAvailableZoneIds()),这对于区域解析花费的99%的时间负责。

There is a class called ZoneTextPrinterParser in the java.time bundle, which is internally making a copy of the set of all available time zones in every parse() call (via ZoneRulesProvider.getAvailableZoneIds()), and this is accountable for 99% of the time spent in the zone parsing.

那么,答案可能是编写我自己的区域解析器,这也不会太好,因为那时我无法构建 DateTimeFormatter 通过 appendPattern()

Well, an answer then might be to write my own zone parser, which would not be too nice either, because then I could not build the DateTimeFormatter via appendPattern().

推荐答案

正如您的问题和我的评论中所述, ZoneRulesProvider.getAvailableZoneIds()创建一组新的所有可用时区的字符串表示形式(<$ c $的键) c> static final ConcurrentMap< String,ZoneRulesProvider> ZONES )每次需要解析时区时。 1

As noted in your question and in my comment, ZoneRulesProvider.getAvailableZoneIds() creates a new set of all the available time zones' string representation (the keys of the static final ConcurrentMap<String, ZoneRulesProvider> ZONES) each time a time zone needs to be parsed.1

堡不过,一个 ZoneRulesProvider 是一个 abstract 类,它被设计为子类。方法 protected abstract Set< String> provideZoneIds()负责填充 ZONES 。因此,如果子类提前知道要使用的所有时区,则它只能提供所需的时区。由于该类提供的条目少于默认提供程序(包含数百个条目),因此它有可能显着减少 getAvailableZoneIds()的调用时间。

Fortunately, a ZoneRulesProvider is an abstract class which is designed to be subclassed. The method protected abstract Set<String> provideZoneIds() is responsible for populating ZONES. Thus, a subclass can provide only the needed time zones if it knows ahead of time of all time zones to be used. Since the class will provide less entries than the default provider, which contains hundreds of entries, it has the potential to significantly reduce the invocation time of getAvailableZoneIds().

ZoneRulesProvider API 提供有关如何注册一个的说明。请注意,提供程序无法取消注册,只能进行补充,因此删除默认提供程序并添加自己的提供程序并不是一件简单的事情。系统属性 java.time.zone.DefaultZoneRulesProvider 定义默认提供程序。如果它返回 null (通过 System.getProperty(...),则会加载JVM臭名昭着的提供程序。使用 System.setProperty(...,具体的ZoneRulesProvider类的完全限定名称)可以提供他们自己的提供者,这是在第2段。

The ZoneRulesProvider API provides instructions on how to register one. Note that providers can't be deregistered, only supplemented, so it is not a simple matter of removing the default provider and adding your own. The system property java.time.zone.DefaultZoneRulesProvider defines the default provider. If it returns null (via System.getProperty("...") then the JVM's notorious provider is loaded. Using System.setProperty("...", "fully-qualified name of a concrete ZoneRulesProvider class") one can supply their own provider, which is the one discussed in the 2nd paragraph.

最后,我建议:


  1. <$> Subclass 抽象类ZoneRulesProvider

  2. 实现受保护的抽象集< String> provideZoneIds() with只有所需的时区。

  3. 将系统属性设置为此类。

  1. Subclass the abstract class ZoneRulesProvider
  2. Implements the protected abstract Set<String> provideZoneIds() with only the needed time zones.
  3. Set the system property to this class.

我做了不是自己做,但我确定它会因某种原因失败认为它会起作用。

I did not do it myself, but I am sure it will fail for some reason think it will work.

1 在问题的评论中建议调用的确切性质可能在1.8版本之间发生了变化。

1 It is suggested in the comments of the question that the exact nature of the invocation might have changed between 1.8 versions.

编辑:找到更多信息

前面提到的默认 ZoneRulesProvider 最终类TzdbZoneRulesProvider 位于 java .time.zone 。从路径中读取该类中的区域: JAVA_HOME / lib / tzdb.dat (在我的情况下,它位于JDK的JRE中)。该文件确实包含许多地区,这里有一个片段:

The aforementioned default ZoneRulesProvider is final class TzdbZoneRulesProvider located in java.time.zone. The regions in that class are read from the path: JAVA_HOME/lib/tzdb.dat (in my case it's in the JDK's JRE). That file indeed contains many regions, here is a snippet:

 TZDB  2014cJ Africa/Abidjan Africa/Accra Africa/Addis_Ababa Africa/Algiers 
Africa/Asmara 
Africa/Asmera 
Africa/Bamako 
Africa/Bangui 
Africa/Banjul 
Africa/Bissau Africa/Blantyre Africa/Brazzaville Africa/Bujumbura Africa/Cairo Africa/Casablanca Africa/Ceuta Africa/Conakry Africa/Dakar Africa/Dar_es_Salaam Africa/Djibouti 
Africa/Douala Africa/El_Aaiun Africa/Freetown Africa/Gaborone 
Africa/Harare Africa/Johannesburg Africa/Juba Africa/Kampala Africa/Khartoum 
Africa/Kigali Africa/Kinshasa Africa/Lagos Africa/Libreville Africa/Lome 
Africa/Luanda Africa/Lubumbashi 
Africa/Lusaka 
Africa/Malabo 
Africa/Maputo 
Africa/Maseru Africa/Mbabane Africa/Mogadishu Africa/Monrovia Africa/Nairobi Africa/Ndjamena 
Africa/Niamey Africa/Nouakchott Africa/Ouagadougou Africa/Porto-Novo Africa/Sao_Tome Africa/Timbuktu Africa/Tripoli Africa/Tunis Africa/Windhoek America/Adak America/Anchorage America/Anguilla America/Antigua America/Araguaina America/Argentina/Buenos_Aires America/Argentina/Catamarca  America/Argentina/ComodRivadavia America/Argentina/Cordoba America/Argentina/Jujuy America/Argentina/La_Rioja America/Argentina/Mendoza America/Argentina/Rio_Gallegos America/Argentina/Salta America/Argentina/San_Juan America/Argentina/San_Luis America/Argentina/Tucuman America/Argentina/Ushuaia 
America/Aruba America/Asuncion America/Atikokan America/Atka 
America/Bahia

然后如果找到一种方法来创建一个只有所需的区域和负载,而性能问题可能不会肯定会被解决。

Then If one finds a way to create a similar file with only the needed zones and load that one instead, the performance issues will probably not surely be resolved.

这篇关于使用新的java.time API解析时区的速度非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆