实时Linux:禁用本地计时器中断 [英] Real time Linux: disable local timer interrupts

查看:260
本文介绍了实时Linux:禁用本地计时器中断的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TL; DR :将Linux内核与NO_HZ_FULL实时结合使用,我需要隔离一个进程以获得确定的结果,但是/proc/interrupts告诉我仍然存在本地计时器中断(以及其他).如何禁用它?

长版:

我想确保我的程序没有被中断,所以我尝试使用实时Linux内核. 我使用的是Arch Linux的实时版本(AUR上的linux-rt),并且修改了内核的配置以选择以下选项:

CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ_FULL_ALL=y
CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_ALL=y

然后我重新启动计算机,以使用以下选项在此实时内核上启动:

nmi_watchdog=0
rcu_nocbs=1
nohz_full=1
isolcpus=1

我还在BIOS中禁用了以下选项:

C state
intel speed step
turbo mode
VTx
VTd
hyperthreading

我的CPU(i7-6700 3.40GHz)具有4个内核(具有超线程技术的8个逻辑CPU) 我可以在/proc/interrupts文件中看到CPU0,CPU1,CPU2,CPU3.

CPU1由isolcpus内核参数隔离,我想禁用此CPU上的本地计时器中断. 尽管具有CONFIG_NO_HZ_FULL和CPU隔离(isolcpus)的实时内核足以做到这一点,但我尝试通过运行以下命令进行检查:

cat /proc/interrupts | grep LOC > ~/tmp/log/overload_cpu1
taskset -c 1 ./overload
cat /proc/interrupts | grep LOC >> ~/tmp/log/overload_cpu1

其中的重载过程是:

***overload.c:***
int main()
{
  for(int i=0;i<100;++i)
    for(int j=0;j<100000000;++j);
}

文件overload_cpu1包含结果:

LOC:     234328        488      12091      11299   Local timer interrupts
LOC:     239072        651      12215      11323   Local timer interrupts

表示651-488 = 163从本地计时器中断,而不是0 ...

为了进行比较,我进行了相同的实验,但是我更改了进程overload运行的核心(我一直在监视CPU1上的中断):

taskset -c 0 :   8 interrupts
taskset -c 1 : 163 interrupts
taskset -c 2 :   7 interrupts
taskset -c 3 :   8 interrupts

我的问题之一是为什么没有0个中断?当我的进程在CPU1上运行时,为什么中断次数更大? (我的意思是,尽管我的进程是单独的,但我虽然NO_HZ_FULL会阻止中断:"CONFIG_NO_HZ_FULL = y Kconfig选项导致内核避免 通过单个可运行任务向CPU发送调度时钟中断"( https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt )

也许可以解释一下CPU1上是否正在运行其他进程. 我通过使用ps命令进行了检查:

CLS CPUID RTPRIO PRI  NI CMD                           PID
TS      1      -  19   0 [cpuhp/1]                      18
FF      1     99 139   - [migration/1]                  20
TS      1      -  19   0 [rcuc/1]                       21
FF      1      1  41   - [ktimersoftd/1]                22
TS      1      -  19   0 [ksoftirqd/1]                  23
TS      1      -  19   0 [kworker/1:0]                  24
TS      1      -  39 -20 [kworker/1:0H]                 25
FF      1      1  41   - [posixcputmr/1]                28
TS      1      -  19   0 [kworker/1:1]                 247
TS      1      -  39 -20 [kworker/1:1H]                501

如您所见,CPU1上有线程. 是否可以禁用这些进程?我想是因为如果不是这种情况,NO_HZ_FULL将永远无法正常工作?

类为TS的任务不会打扰我,因为它们在SCHED_FIFO中没有优先级,我可以将此策略设置为我的程序. FF级和优先级小于99的任务也是如此.

但是,您可以看到SCHED_FIFO中的migration/1和优先级99. 这些过程在运行时可能会导致中断.这解释了当我的进程进入CPU0,CPU2和CPU3时的几个中断(分别为8,7和8个中断),但是这也意味着这些进程不是很频繁地运行,因此没有解释为什么我的进程运行时为什么会有很多中断在CPU1上(163个中断).

我也进行了相同的实验,但是在重载过程中使用了SCHED_FIFO,我得到了:

taskset -c 0 : 1
taskset -c 1 : 4063
taskset -c 2 : 1
taskset -c 3 : 0

在这种配置下,如果我的进程在CPU1上使用SCHED_FIFO策略,则中断更多,而在其他CPU上,中断更少.你知道为什么吗?

解决方案

问题是,一个完全不滴答的CPU(也称为自适应滴答,配置有nohz_full=)仍然会收到一些滴答声.

最值得注意的是,调度程序需要在隔离的完整无滴答CPU上安装一个计时器,以便每秒大约更新一次状态.

这是已记录的限制(截至2019年):

某些流程处理操作仍然需要偶尔 计划时钟滴答声.这些操作包括计算CPU 负载,保持预定平均值,计算CFS实体vruntime, 计算avenrun并执行负载平衡.他们是 当前由调度时钟滴答声每秒容纳 或者.正在进行的工作将消除对这些工作的需要 不频繁的计划时钟滴答声.

(来源: Documentation/timers/NO_HZ.txt ,参见LWN文章(从2013年开始,在3.10中完全无滴答作响出于某些背景) >

测量本地计时器中断(/proc/interrupts中的LOC行)的一种更准确的方法是使用perf.例如:

$ perf stat -a -A -e irq_vectors:local_timer_entry ./my_binary

my_binary的线程固定到隔离的CPU上,这些线程不间断地使用CPU,而无需调用系统调用-持续2分钟.

还有其他本地计时器滴答声的来源(当只有1个可运行任务时).

例如,VM统计信息的收集-默认情况下,它们每秒钟收集一次.因此,我可以通过设置较高的值来减少LOC中断,例如:

# sysctl vm.stat_interval=60

另一个来源是定期检查不同CPU上的TSC是否不漂移-您可以使用以下内核选项禁用它们:

tsc=reliable

(如果您真的知道您的TSC不会漂移,请仅应用此选项.)

通过使用 ftrace (当您的测试二进制文件正在运行时.)

由于它出现在注释中:是的,SMI对内核是完全透明的.它不会显示为NMI.您只能间接检测SMI.

TL;DR : Using Linux kernel real time with NO_HZ_FULL I need to isolate a process in order to have deterministic results but /proc/interrupts tell me there is still local timer interrupts (among other). How to disable it?

Long version :

I want to make sure my program is not being interrupt so I try to use a real time Linux kernel. I'm using the real time version of arch Linux (linux-rt on AUR) and I modified the configuration of the kernel to selection the following options :

CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ_FULL_ALL=y
CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_ALL=y

then I reboot my computer to boot on this real time kernel with the folowing options:

nmi_watchdog=0
rcu_nocbs=1
nohz_full=1
isolcpus=1

I also disable the following option in the BIOS :

C state
intel speed step
turbo mode
VTx
VTd
hyperthreading

My CPU (i7-6700 3.40GHz) has 4 cores (8 logical CPU with hyperthreading technology) I can see CPU0, CPU1, CPU2, CPU3 in /proc/interrupts file.

CPU1 is isolated by isolcpus kernel parameter and I want to disable the local timer interrupts on this CPU. I though real-time kernel with CONFIG_NO_HZ_FULL and CPU isolation (isolcpus) was enough to do it and I try to check by running theses command :

cat /proc/interrupts | grep LOC > ~/tmp/log/overload_cpu1
taskset -c 1 ./overload
cat /proc/interrupts | grep LOC >> ~/tmp/log/overload_cpu1

where the overload process is:

***overload.c:***
int main()
{
  for(int i=0;i<100;++i)
    for(int j=0;j<100000000;++j);
}

The file overload_cpu1 contains the result:

LOC:     234328        488      12091      11299   Local timer interrupts
LOC:     239072        651      12215      11323   Local timer interrupts

meanings 651-488 = 163 interrupts from local timer and not 0...

For comparison I do the same experiment but I change the core where my process overload run (I keep watching interrupts on CPU1):

taskset -c 0 :   8 interrupts
taskset -c 1 : 163 interrupts
taskset -c 2 :   7 interrupts
taskset -c 3 :   8 interrupts

One of my question is why there is no 0 interrupts ? why the number of interrupts is bigger when my process run on CPU1 ? (I mean I though NO_HZ_FULL will prevent interrupt if my process was alone : "The CONFIG_NO_HZ_FULL=y Kconfig option causes the kernel to avoid sending scheduling-clock interrupts to CPUs with a single runnable task"(https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt)

Maybe an explaination is there is other process running on CPU1. I checked by using ps command :

CLS CPUID RTPRIO PRI  NI CMD                           PID
TS      1      -  19   0 [cpuhp/1]                      18
FF      1     99 139   - [migration/1]                  20
TS      1      -  19   0 [rcuc/1]                       21
FF      1      1  41   - [ktimersoftd/1]                22
TS      1      -  19   0 [ksoftirqd/1]                  23
TS      1      -  19   0 [kworker/1:0]                  24
TS      1      -  39 -20 [kworker/1:0H]                 25
FF      1      1  41   - [posixcputmr/1]                28
TS      1      -  19   0 [kworker/1:1]                 247
TS      1      -  39 -20 [kworker/1:1H]                501

As you can see, there is threads on the CPU1. Is that possible to disable these processes ? I guess it is because if it is not the case, NO_HZ_FULL will never work right ?

Tasks with class TS doesn't disturb me because they didn't have priority among SCHED_FIFO and I can set this policy to my program. Same things for tasks with class FF and priority less than 99.

However, you can see migration/1 that is in SCHED_FIFO and priority 99. Maybe these process can causes interrupts when they run . This explain the few interrupts when my process in on CPU0, CPU2 and CPU3 (respectively 8,7 and 8 interrupts) but it also mean these processes are not running very often and then doesn't explain why there is many interrupts when my process run on CPU1 (163 interrupts).

I also do the same experiment but with the SCHED_FIFO of my overload process and I get:

taskset -c 0 : 1
taskset -c 1 : 4063
taskset -c 2 : 1
taskset -c 3 : 0

In this configuration there is more interrupts in the case my process use SCHED_FIFO policy on CPU1 and less on other CPU. do you know why ?

解决方案

The thing is that a full-tickless CPU (a.k.a. adaptive-ticks, configured with nohz_full=) still receives some ticks.

Most notably the scheduler requires a timer on an isolated full tickless CPU for updating some state every second or so.

This is a documented limitation (as of 2019):

Some process-handling operations still require the occasional scheduling-clock tick. These operations include calculating CPU load, maintaining sched average, computing CFS entity vruntime, computing avenrun, and carrying out load balancing. They are currently accommodated by scheduling-clock tick every second or so. On-going work will eliminate the need even for these infrequent scheduling-clock ticks.

(source: Documentation/timers/NO_HZ.txt, cf. the LWN article (Nearly) full tickless operation in 3.10 from 2013 for some background)

A more accurate method to measure the local timer interrupts (LOC row in /proc/interrupts) is to use perf. For example:

$ perf stat -a -A -e irq_vectors:local_timer_entry ./my_binary

Where my_binary has threads pinned to the isolated CPUs that non-stop utilize the CPU without invoking syscalls - for - say 2 minutes.

There are other sources of additional local timer ticks (when there is just 1 runnable task).

For example, the collection of VM stats - by default they are collected each seconds. Thus, I can decrease my LOC interrupts by setting a higher value, e.g.:

# sysctl vm.stat_interval=60

Another source are periodic checks if the TSC on the different CPUs doesn't drift - you can disable those with the following kernel option:

tsc=reliable

(Only apply this option if you really know that your TSCs don't drift.)

You might find other sources by recording traces with ftrace (while your test binary is running).

Since it came up in the comments: Yes, the SMI is fully transparent to the kernel. It doesn't show up as NMI. You can only detect an SMI indirectly.

这篇关于实时Linux:禁用本地计时器中断的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆