当繁忙旋转的Java线程绑定到物理核心时,是否可以通过到达代码中的新分支的原因来进行上下文切换? [英] When busy-spining Java thread is bound to physical core, can context switch happen by the reason that new branch in code is reached?

查看:120
本文介绍了当繁忙旋转的Java线程绑定到物理核心时,是否可以通过到达代码中的新分支的原因来进行上下文切换?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对低延迟代码感兴趣,这就是为什么我尝试配置线程相似性的原因.特别是,它应该有助于避免上下文切换.

I am interested in low-latency code and that`s why I tried to configure thread affinity. In particular, it was supposed to help to avoid context switches.

我已经使用 https://github.com/OpenHFT/Java-Thread配置了线程相似性-亲和力.我运行了非常简单的测试代码,该代码只是循环检查时间条件.

I have configured thread affinity using https://github.com/OpenHFT/Java-Thread-Affinity. I run very simple test code that just spins in a cycle checking a time condition.

    long now = start;
    while (true)
    {
        if (now < start + TimeUtils.NANOS_IN_SECOND * delay)
        {
            now = TimeUtils.now();
        }
        else
        {
            // Will be printed after 30 sec
            if (TimeUtils.now() > start + TimeUtils.NANOS_IN_SECOND * (delay + 30))
            {
                final long finalNow = now;
                System.out.println("Time is over at " +
                        TimeUtils.toInstant(finalNow) + " now: " +
                        TimeUtils.toInstant(TimeUtils.now()));
                System.exit(0);
            }
        }
    }

因此,在指定的延迟执行之后,执行否则"处理.大约在同一时间,我看到了上下文切换.这是预期的行为吗?具体原因是什么?在这种情况下如何避免上下文切换?

So, after the specified delay execution goes to "else" and approximately at the same time I see context switch. Is it an expected behavior? What is the specific reason for this? How can I avoid context switches in such cases?

测试详细信息

我通过此存储库构建了shadowJar: https://github.com/stepan2271/thread-亲和力示例.然后,我使用以下命令运行它(一个可以在此处播放数字,当延迟> 60时,它对测试没有显着影响):

I build shadowJar from this repo: https://github.com/stepan2271/thread-affinity-example. Then I run it using the following command (one can play around with numbers here, it doesn`t have significant effect on test when delay > 60):

taskset -c 19 java -DtestLoopBindingCpu=3 -Ddelay=74 -cp demo-all.jar main.TestLoop

我还具有以下测试脚本来监视上下文切换(应该以绑定到核心的Java线程的ID运行)

I also have the following test script to monitor context switches (should be run with ID of the Java thread that is bound to core)

#!/bin/bash
while [ true ]
do
date >> ~/demo-ctxt-switches.log
cat /proc/$1/status | grep ctxt >> ~/demo-ctxt-switches.log
sleep 3
done

此脚本的典型输出如下:

Typical output of this script is the following:

Fri Oct 16 18:23:29 MSK 2020
voluntary_ctxt_switches:    90
nonvoluntary_ctxt_switches: 37
Fri Oct 16 18:23:32 MSK 2020
voluntary_ctxt_switches:    90
nonvoluntary_ctxt_switches: 37
Fri Oct 16 18:23:35 MSK 2020
voluntary_ctxt_switches:    90
nonvoluntary_ctxt_switches: 37
Fri Oct 16 18:23:38 MSK 2020
voluntary_ctxt_switches:    90
nonvoluntary_ctxt_switches: 37
Fri Oct 16 18:23:41 MSK 2020
voluntary_ctxt_switches:    91
nonvoluntary_ctxt_switches: 37
Fri Oct 16 18:23:44 MSK 2020
voluntary_ctxt_switches:    91
nonvoluntary_ctxt_switches: 37
Fri Oct 16 18:23:47 MSK 2020
voluntary_ctxt_switches:    91
nonvoluntary_ctxt_switches: 37

因此,在开始时间进行了一些更改之后,这些数字变得稳定,然后我看到在代码达到其他"时,从1到3个开关正好(差小于1秒).分支.

So, after some changes in start time these numbers become stable and then I see from 1 to 3 switches exactly (difference is less than 1 sec) at the time when code reaches "else" branch.

偏差

基本配置几乎每次都会重现此行为,而某些偏差导致我无法重现时的情况.例子:

Base configuration reproduces this behavior almost each time, while some deviations lead to situation when I didn`t manage to reproduce. Examples:

https://github.com/stepan2271/thread-affinity -example/tree/without-log4j

https://github.com/stepan2271 /thread-affinity-example/tree/without-cached-nano-clock

测试环境

2 *英特尔(R)至强(R)金牌6244 CPU @ 3.60GHz

2 * Intel(R) Xeon(R) Gold 6244 CPU @ 3.60GHz

Red Hat Enterprise Linux 8.1(Ootpa)

Red Hat Enterprise Linux 8.1 (Ootpa)

在/etc/systemd/system.conf和/etc/systemd/user.conf中使用CPUAffinity隔离内核

Cores are isolated using CPUAffinity in /etc/systemd/system.conf and /etc/systemd/user.conf

/etc/sysconfig/irqbalance已配置.

/etc/sysconfig/irqbalance is configured.

Openjdk 11.0.6 2020-01-14 LTS运行时环境18.9

Openjdk 11.0.6 2020-01-14 LTS Runtime Environment 18.9

推荐答案

自愿上下文切换通常表示线程正在等待某些内容,例如以获得免费的锁.

A voluntary context switch usually means a thread is waiting for something, e.g. for a lock to become free.

async-profiler 可以帮助查找上下文切换发生的位置.这是我使用的命令行:

async-profiler can help to find where context switches happen. Here is a command line I used:

./profiler.sh -d 80 -e context-switches -i 2 -t -f switches.svg -I 'main*' -X 'exit_to_usermode_loop*' PID

让我们详细了解一下:

  • -d 80最多运行Profiler 80秒.
  • -e context-switches要配置的事件.
  • -i 2间隔= 2个事件.我剖析了每个第二个上下文切换,因为分析信号本身会导致上下文切换,并且我不想陷入递归.
  • -t按线程划分配置文件.
  • -f switches.svg输出文件名; svg扩展程序会自动选择火焰图"格式.
  • -I 'main*'在输出中仅包含主线程.
  • -X 'exit_to_usermode_loop*'排除与非自愿上下文切换有关的事件.
  • PID要配置文件的Java进程ID.
  • -d 80 run profiler for at most 80 seconds.
  • -e context-switches an event to profile.
  • -i 2 interval = 2 events. I profile every second context-switch, since the profiling signal itself causes a context switch, and I don't want to fall into recursion.
  • -t split the profile by threads.
  • -f switches.svg output file name; svg extension automatically selects Flame Graph format.
  • -I 'main*' include only the main thread in the output.
  • -X 'exit_to_usermode_loop*' exclude events related to nonvoluntary context switches.
  • PID Java process ID to profile.

一次运行的结果可能与另一次运行的结果不同.通常,我在每个图上看到从0到3个上下文切换.

The results may differ from one run to another. Typically I see from 0 to 3 context switches on each graph.

这是上下文切换发生的最常见的地方.它们确实与等待互斥有关.

Here are the most common places where a context switch happens. They are indeed related to waiting on a mutex.

    TestLoop.main调用的
  1. ThreadSafepointState::handle_polling_page_exception().这意味着一个线程已在另一个线程请求的安全点处停止.要调查安全点的原因,请添加-Xlog:safepoint* JVM选项.
  1. ThreadSafepointState::handle_polling_page_exception() called from TestLoop.main. This means, a thread has been stopped at a safepoint requested by another thread. To investigate a reason of a safepoint, add -Xlog:safepoint* JVM option.

[75.889s][info][safepoint        ] Application time: 74.0071000 seconds
[75.889s][info][safepoint        ] Entering safepoint region: Cleanup
[75.889s][info][safepoint,cleanup] deflating idle monitors, 0.0000003 secs
[75.889s][info][safepoint,cleanup] updating inline caches, 0.0000058 secs
[75.890s][info][safepoint,cleanup] compilation policy safepoint handler, 0.0000004 secs
[75.890s][info][safepoint,cleanup] purging class loader data graph, 0.0000001 secs
[75.890s][info][safepoint,cleanup] resizing system dictionaries, 0.0000009 secs
[75.890s][info][safepoint,cleanup] safepoint cleanup tasks, 0.0001440 secs
[75.890s][info][safepoint        ] Leaving safepoint region

正确的是,在74秒(恰好是指定的延迟)后不久便发生了一个清理安全点. 清理安全点的目的是运行定期任务;在这种情况下-更新内联缓存.如果有清理工作要做,安全点可能每GuaranteedSafepointInterval毫秒(默认为1000)发生一次.您可以通过设置-XX:GuaranteedSafepointInterval=0来禁用定期安全点,但是它可能具有对性能的影响.

Right, a Cleanup safepoint happens shortly after 74 seconds (exactly the specified delay). The purpose of a Cleanup safepoint is to run periodic tasks; in this case - to update inline caches. If there is cleanup work to do, a safepoint may happen every GuaranteedSafepointInterval milliseconds (1000 by default). You can disable periodic safepoints by setting -XX:GuaranteedSafepointInterval=0, but this may have performance implications.

    TimeUtils.now中的
  1. SharedRuntime::handle_wrong_method().当已编译代码中的呼叫站点变为非进入时,就会发生这种情况.由于这与JIT编译有关,因此添加-XX:+PrintCompilation选项.
  1. SharedRuntime::handle_wrong_method() from TimeUtils.now. This happens when a call site in the compiled code has been made non-entrant. As this is related to JIT compilation, add -XX:+PrintCompilation option.

  75032 1430 %     4       main.TestLoop::main @ 149 (245 bytes)   made not entrant
  75033 1433 %     3       main.TestLoop::main @ 149 (245 bytes)
  75033 1434       4       util.RealtimeNanoClock::nanoTime (8 bytes)
  75034 1431       3       util.RealtimeNanoClock::nanoTime (8 bytes)   made not entrant
  75039 1435 %     4       main.TestLoop::main @ 149 (245 bytes)
  75043 1433 %     3       main.TestLoop::main @ 149 (245 bytes)   made not entrant

是的,TestLoop.mainRealtimeNanoClock.nanoTime都在JVM启动75秒后重新编译了.要找出原因,请添加-XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation.

Yes, both TestLoop.main and RealtimeNanoClock.nanoTime were recompiled 75 seconds after JVM start. To find out the reason, add -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation.

这将产生一个大的编译日志,我们将在其中查找第75秒发生的事件.

This will produce a large compilation log, where we'll look for an event happened at 75th second.

<uncommon_trap thread='173414' reason='unstable_if' action='reinterpret' debug_id='0' compile_id='1232' compile_kind='osr' compiler='c2' level='4' stamp='75.676'>
<jvms bci='161' method='main.TestLoop main ([Ljava/lang/String;)V' bytes='245' count='1' backedge_count='533402' iicount='1'/>

由于字节码索引161处的unstable_if,这是一个不常见的陷阱.换句话说,当main如果是JIT编译的,则HotSpot不会为else分支生成代码,因为它以前从未执行过(消除了推测性的死代码).但是,为了保持编译后代码的正确性,如果推测条件失败,HotSpot会设置陷阱以进行非优化并退回给解释器.当if条件变为false时,这正是您的情况.

That was an uncommon trap due to unstable_if at bytecode index 161. In other words, when main was JIT compiled, HotSpot did not produce code for the else branch, because it was never executed before (such a speculative dead code elimination). However, to retain correctness of the compiled code, HotSpot places a trap to deoptimize and fall back to the interpreter, if the speculative condition fails. This is exactly what happens in your case when if condition becomes false.

  1. Runtime1::counter_overflow().这又与重新编译有关.在运行C1编译的代码一段时间后,HotSpot发现该代码很热,并决定使用C2重新编译它.

  1. Runtime1::counter_overflow(). This is again related to recompilation. After running C1 compiled code for some time, HotSpot discovers that the code is hot, and decides to recompile it with C2.

在这种情况下,我在编译器队列上捕获了竞争锁.

In this case I caught a contended lock on the compiler queue.

结论

HotSpot JIT编译器严重依赖于推测性优化.当投机条件失败时,这会导致优化不足.取消优化确实对低延迟的应用程序非常不利:除了在解释器中切换到慢速执行之外,这还可能由于获取JVM运行时中的锁或将JVM置于安全点而间接导致不希望的暂停.

HotSpot JIT compilers heavily rely on speculative optimizations. When a speculative condition fails, this leads to deoptimization. Deoptimization is indeed very bad for low latency applications: besides switching to slow execution in the interpreter, this may indirectly cause undesired pauses due to acquiring locks in the JVM runtime, or bringing the JVM to a safepoint.

取消优化的常见原因是unstable_ifclass_check.如果要避免在延迟关键路径上进行非优化,请确保对服务器进行热身"操作.虚拟方法的所有代码路径和所有可能的接收器.

Common reasons for deoptimization are unstable_if and class_check. If you want to avoid deoptimization on a latency critical path, make sure to "warm-up" all code paths and all possible receivers for virtual methods.

这篇关于当繁忙旋转的Java线程绑定到物理核心时,是否可以通过到达代码中的新分支的原因来进行上下文切换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆