Linux不遵守SCHED_FIFO优先级? (正常或GDB执行) [英] Linux not respecting SCHED_FIFO priority ? ( normal or GDB execution )
问题描述
TL; DR
似乎在多处理器/多核引擎上,可能会调度多个RT SCHED_FIFO线程一个以上的执行单元。因此,优先级为60的线程和优先级为40的线程可以同时在2个不同的内核上运行。
It seems that on multiprocessors/multicores engines, more than one RT SCHED_FIFO threads may be scheduled on more than one execution unit. So thread with priority 60 and thread with priority 40 may run simultaneously on 2 different cores.
这可能是违反直觉的,尤其是在模拟运行在依赖于单核处理器的嵌入式系统上时严格优先级执行。
This may be counter-intuitive, especially when simulating embedded systems that runs on single core processor that rely on strict priority execution.
对于仿真,您不能修改原始代码,但可以将仿真代码的执行限制在单个内核上(例如,任务集外壳cmd或sched_setaffinity(... ))
For simulation you can not modify the original code, but you can restrict the execution of the simulated code on a single core ( e.g. taskset shell cmd or sched_setaffinity(...))
原始问题描述
即使使用非常简单的代码,我仍然遇到困难通过调度策略SCHED_FIFO使Linux尊重我的线程的优先级。
I have difficulties even with very simple code to make Linux respect the priority of my threads with scheduling policy SCHED_FIFO.
- 请参阅问题末尾的MCVE。
- 请参见修改后的MCVE
这种情况是由于需要在Linux PC上模拟嵌入式代码以执行集成测试
This situation comes from the need to simulate an embedded code under a Linux PC in order to perform integration tests
具有fifo优先级 10
的主
线程将启动线程除数
和比率
。
The main
thread with fifo priority 10
will launch the thread divisor
and ratio
.
除数
线程应该获得 priority 2
,这样 ratio
线程和优先级1
不会在b得到一个合适的值之前对a / b进行求值(这仅是针对MCVE的完全假设的情况,而不是带有信号量或条件变量的实际情况)。
divisor
thread should get priority 2
so that the ratio
thread with priority 1
will not evaluate a/b before b gets a decent value ( this is a completely hypothetical scenario only for the MCVE, not a real life case with semaphores or condition variables ).
潜在的先决条件:您需要是root 或更好地 setcap 该程序,以便可以更改计划策略并优先级
Potential Prerequiste: You need to be root or BETTER to setcap the program so that to can change the scheduling policy and priority
sudo setcap cap_sys_nice + ep main
johndoe@VirtualBox:~/Code/gdb_sched_fifo$ getcap main
main = cap_sys_nice+ep
-
在Virtualbox环境下使用2个vCPU(gcc(Ubuntu 7.5.0-3ubuntu1〜18.04)7.5.0,GNU gdb(Ubuntu 8.1-0ubuntu3.2)8.1.0.20180409-git)进行了首次实验在正常执行情况下,代码行为几乎
OK
utNOK
在GDB中。First experiments were done under Virtualbox environment with 2 vCPUs(gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git) were code behaviour was almost
OK
under normal execution butNOK
under GDB.在本机Ubuntu 20.04上进行的其他实验显示非常频繁
NOK
行为,即使在I3-1005 2C / 4T(gcc(Ubuntu 9.3.0-10ubuntu2)9.3.0,GNU gdb(Ubuntu 9.1-0ubuntu1)9.1)的正常执行中也是如此/ p>
Other experiments on Native Ubuntu 20.04 show very frequent
NOK
behaviours even in normal execution with I3-1005 2C/4T (gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0, GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1 )基本编译:
johndoe@VirtualBox:~/Code/gdb_sched_fifo$ g++ main.cc -o main -pthread
正常执行有时可以没有root或没有setcap
Normal execution sometimes OK sometimes not if no root or no setcap
johndoe@VirtualBox:~/Code/gdb_sched_fifo$ ./main Problem with setschedparam: Operation not permitted(1) <<-- err msg if no root or setcap Result: 0.333333 or Result: Inf <<-- 1/3 or div by 0
正常执行可以(例如使用setcap)
Normal execution OK (e.g with setcap )
johndoe@VirtualBox:~/Code/gdb_sched_fifo$ ./main Result: 0.333333
现在,如果您想调试该程序,您会再次受到攻击
Now if you want to debug this program you get again an the error message.
(gdb) run Starting program: /home/johndoe/Code/gdb_sched_fifo/main [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New Thread 0x7f929a6a9700 (LWP 2633)] Problem with setschedparam: Operation not permitted(1) <<--- ERROR MSG Result: inf <<--- DIV BY 0 [New Thread 0x7f9299ea8700 (LWP 2634)] [Thread 0x7f929a6a9700 (LWP 2633) exited] [Thread 0x7f9299ea8700 (LWP 2634) exited] [Inferior 1 (process 2629) exited normally]
此问题说明 gdb似乎忽略了可执行功能(几乎所有答案都可能是相关的)。
This is explained in this question gdb appears to ignore executable capabilities ( allmost all answers may be relevant ).
所以在我的情况下,我确实这样做了
So in my case I did
-
sudo setcap cap_sys_nice + ep / usr / bin / gdb
- 创建一个〜/ .gdbinit与
设置了不带外壳的启动壳
sudo setcap cap_sys_nice+ep /usr/bin/gdb
- create a ~/.gdbinit with
set startup-with-shell off
结果是: / p>
And as a result I got:
(gdb) run Starting program: /home/johndoe/Code/gdb_sched_fifo/main [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New Thread 0x7ffff6e85700 (LWP 2691)] Result: inf <<-- NO ERR MSG but DIV BY 0 [New Thread 0x7ffff6684700 (LWP 2692)] [Thread 0x7ffff6e85700 (LWP 2691) exited] [Thread 0x7ffff6684700 (LWP 2692) exited] [Inferior 1 (process 2687) exited normally] (gdb)
所以结论和问题
- 我认为唯一的问题来自GDB
- 对另一个(非虚拟)目标的测试显示,在正常执行情况下,结果甚至更糟
我看到了与RT SCHED_FIFO相关的其他问题,但是我发现答案没有结论或不清楚的结论。我的MCVE也更小,潜在的副作用也更少
I saw other questions related to RT SCHED_FIFO not respected but I find that the answers have no or unclear conclusions. My MCVE is also much smaller with fewer potential side-effects
SCHED_FIFO更高优先级的线程被SCHED_FIFO更低优先级的线程抢占了吗? a>
SCHED_FIFO higher priority thread is getting preempted by the SCHED_FIFO lower priority thread?
评论带来了一些答案,但我仍然不相信...(...应该这样)
Comments brought some pieces of answer but I am still not convinced ... ( ... it should work like this )
MCVE:
#include <iostream> #include <thread> #include <cstring> double a = 1.0F; double b = 0.0F; void ratio(void) { struct sched_param param; param.sched_priority = 1; int ret = pthread_setschedparam(pthread_self(),SCHED_FIFO,¶m); if ( 0 != ret ) std::cout << "Problem with setschedparam: " << std::strerror(errno) << '(' << errno << ')' << "\n" << std::flush; std::cout << "Result: " << a/b << "\n" << std::flush; } void divisor(void) { struct sched_param param; param.sched_priority = 2; pthread_setschedparam(pthread_self(),SCHED_FIFO,¶m); b = 3.0F; std::this_thread::sleep_for(std::chrono::milliseconds(2000u)); } int main(int argc, char * argv[]) { struct sched_param param; param.sched_priority = 10; pthread_setschedparam(pthread_self(),SCHED_FIFO,¶m); std::thread thr_ratio(ratio); std::thread thr_divisor(divisor); thr_ratio.join(); thr_divisor.join(); return 0; }
推荐答案
显然有几件事您的MCVE错误:
There are a few things obviously wrong with your MCVE:
-
您在
b
上进行了数据竞争,即未定义的行为,因此任何事情都可能发生。
You have a data race on
b
, i.e. undefined behavior, so anything can happen.
您期望的是
除数
线程将完成pthread_setschedparam
调用,然后ratio
线程进行计算You are expecting that the
divisor
thread will have finishedpthread_setschedparam
call before theratio
thread gets to computing the ratio.但是绝对不能保证在创建第二个线程之前就不会运行第一个线程。
But there is absolutely no guarantee that the first thread will not run to completion long before the second thread is even created.
确实这就是在GDB中可能发生的事情:它必须捕获线程创建和销毁事件,以便跟踪所有线程,因此,在GDB下的线程创建要比在其外部慢得多。
Indeed that is what's likely happening under GDB: it must trap thread creation and destruction events in order to keep track of all the threads, and so thread creation under GDB is significantly slower than outside of it.
要解决第二个问题,请添加计数信号量,并在执行后每个线程randevu 的
pthread_setschedparam
调用。To fix the second problem, add a counting semaphore, and have both threads randevu after each executed the
pthread_setschedparam
call.这篇关于Linux不遵守SCHED_FIFO优先级? (正常或GDB执行)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!