Linux 不尊重 SCHED_FIFO 优先级?(正常或 GDB 执行) [英] Linux not respecting SCHED_FIFO priority ? ( normal or GDB execution )

查看:16
本文介绍了Linux 不尊重 SCHED_FIFO 优先级?(正常或 GDB 执行)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TL;DR

在多处理器/多核引擎上,可以在多个执行单元上调度多个 RT SCHED_FIFO 线程.因此优先级为 60 的线程和优先级为 40 的线程可以同时运行在 2 个不同的内核上.

On multiprocessors/multicores engines, more than one RT SCHED_FIFO threads may be scheduled on more than one execution unit. So thread with priority 60 and thread with priority 40 may run simultaneously on 2 different cores.

这可能违反直觉,尤其是在模拟嵌入式系统时(通常像今天一样)在单核处理器上运行并依赖严格的优先级执行.

This may be counter-intuitive, especially when simulating embedded systems that (often as today) run on single core processors and rely on strict priority execution.

请参阅此帖子中的我的其他答案以获取摘要

See my other answer in this post for summary

原始问题描述

即使使用非常简单的代码来让 Linux 使用调度策略 SCHED_FIFO 尊重我的线程的优先级,我也遇到了困难.

I have difficulties even with very simple code to make Linux respect the priority of my threads with scheduling policy SCHED_FIFO.

  • 请参阅问题末尾的 MCVE.
  • 在答案中查看修改后的 MCVE

这种情况来自于需要在 Linux PC 下模拟嵌入式代码以执行集成测试

This situation comes from the need to simulate an embedded code under a Linux PC in order to perform integration tests

fifo 优先级为 10main 线程将启动线程 divisorratio.

The main thread with fifo priority 10 will launch the thread divisor and ratio.

divisor 线程应该获得 priority 2 以便 ratio 线程与 priority 1 不会评估 a/b 在 b 获得一个不错的值之前(这只是 MCVE 的完全假设场景,而不是带有信号量或条件变量的真实案例).

divisor thread should get priority 2 so that the ratio thread with priority 1 will not evaluate a/b before b gets a decent value ( this is a completely hypothetical scenario only for the MCVE, not a real life case with semaphores or condition variables ).

潜在的先决条件:您需要是 root 或更好地setcap 程序,以便可以更改调度策略和优先级

Potential Prerequiste: You need to be root or BETTER to setcap the program so that to can change the scheduling policy and priority

sudo setcap cap_sys_nice+ep main

johndoe@VirtualBox:~/Code/gdb_sched_fifo$ getcap main
main = cap_sys_nice+ep

  • 第一个实验是在 Virtualbox 环境下进行的,有 2 个 vCPU(gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git) 是代码行为在正常执行下几乎是 OK 但在 GDB 下是 NOK.

    • First experiments were done under Virtualbox environment with 2 vCPUs(gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git) were code behaviour was almost OK under normal execution but NOK under GDB.

      在原生 Ubuntu 20.04 上的其他实验显示了非常频繁的 NOK 行为,即使在使用 I3-1005 2C/4T (gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0, GNUgdb (Ubuntu 9.1-0ubuntu1) 9.1)

      Other experiments on Native Ubuntu 20.04 show very frequent NOK behaviours even in normal execution with I3-1005 2C/4T (gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0, GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1 )

      基本编译:

      johndoe@VirtualBox:~/Code/gdb_sched_fifo$ g++ main.cc -o main -pthread
      

      正常执行有时可以,有时如果没有 root 或没有 setcap 则不行

      Normal execution sometimes OK sometimes not if no root or no setcap

      johndoe@VirtualBox:~/Code/gdb_sched_fifo$ ./main
      Problem with setschedparam: Operation not permitted(1)  <<-- err msg if no root or setcap
      Result: 0.333333 or Result: Inf                         <<-- 1/3 or div by 0
      

      正常执行正常(例如使用 setcap )

      Normal execution OK (e.g with setcap )

      johndoe@VirtualBox:~/Code/gdb_sched_fifo$ ./main
      Result: 0.333333
      

      现在如果你想调试这个程序,你会再次收到错误消息.

      Now if you want to debug this program you get again an the error message.

      (gdb) run
      Starting program: /home/johndoe/Code/gdb_sched_fifo/main 
      [Thread debugging using libthread_db enabled]
      Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
      [New Thread 0x7f929a6a9700 (LWP 2633)]
      Problem with setschedparam: Operation not permitted(1)     <<--- ERROR MSG
      Result: inf                                                <<--- DIV BY 0
      [New Thread 0x7f9299ea8700 (LWP 2634)]
      [Thread 0x7f929a6a9700 (LWP 2633) exited]
      [Thread 0x7f9299ea8700 (LWP 2634) exited]
      [Inferior 1 (process 2629) exited normally]
      

      这在这个问题 gdb 似乎忽略了可执行功能中有解释(几乎所有答案都可能是相关的).

      This is explained in this question gdb appears to ignore executable capabilities ( allmost all answers may be relevant ).

      所以在我的情况下,我做到了

      So in my case I did

      • sudo setcap cap_sys_nice+ep/usr/bin/gdb
      • 使用 set startup-with-shell off
      • 创建一个 ~/.gdbinit
      • sudo setcap cap_sys_nice+ep /usr/bin/gdb
      • create a ~/.gdbinit with set startup-with-shell off

      结果我得到了:

      (gdb) run
      Starting program: /home/johndoe/Code/gdb_sched_fifo/main 
      [Thread debugging using libthread_db enabled]
      Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
      [New Thread 0x7ffff6e85700 (LWP 2691)]
      Result: inf                              <<-- NO ERR MSG but DIV BY 0 
      [New Thread 0x7ffff6684700 (LWP 2692)]
      [Thread 0x7ffff6e85700 (LWP 2691) exited]
      [Thread 0x7ffff6684700 (LWP 2692) exited]
      [Inferior 1 (process 2687) exited normally]
      (gdb) 
      

      结论和问题

      • 我认为唯一的问题来自 GDB
      • 在另一个(非虚拟)目标上的测试在正常执行下显示出更差的结果

      我看到其他与 RT SCHED_FIFO 相关的问题没有得到尊重,但我发现答案没有或不明确的结论.我的 MCVE 也更小,潜在的副作用更少

      I saw other questions related to RT SCHED_FIFO not respected but I find that the answers have no or unclear conclusions. My MCVE is also much smaller with fewer potential side-effects

      Linux SCHED_FIFO 不考虑线程优先级

      SCHED_FIFO 高优先级线程被 SCHED_FIFO 低优先级线程抢占?

      评论带来了一些答案,但我仍然不相信......(......它应该像这样工作)

      Comments brought some pieces of answer but I am still not convinced ... ( ... it should work like this )

      MCVE:

      #include <iostream>
      #include <thread>
      #include <cstring>
      
      double a = 1.0F;
      double b = 0.0F;
      
      void ratio(void)
      {
          struct sched_param param;
          param.sched_priority = 1;
          int ret = pthread_setschedparam(pthread_self(),SCHED_FIFO,&param);
              if ( 0 != ret )
          std::cout << "Problem with setschedparam: " << std::strerror(errno) << '(' << errno << ')' << "
      " << std::flush;
      
          std::cout << "Result: " << a/b << "
      " << std::flush;
      }
      
      void divisor(void)
      {
          struct sched_param param;
          param.sched_priority = 2;
          pthread_setschedparam(pthread_self(),SCHED_FIFO,&param);
      
          b = 3.0F;
      
          std::this_thread::sleep_for(std::chrono::milliseconds(2000u));
      }
      
      
      int main(int argc, char * argv[])
      {
          struct sched_param param;
          param.sched_priority = 10;
          pthread_setschedparam(pthread_self(),SCHED_FIFO,&param);
      
          std::thread thr_ratio(ratio);
          std::thread thr_divisor(divisor);
      
          thr_ratio.join();
          thr_divisor.join();
      
          return 0;
      }
      

      推荐答案

      你的 MCVE 有一些明显的问题:

      There are a few things obviously wrong with your MCVE:

      1. 您在 b 上存在数据竞争,即未定义的行为,因此任何事情都可能发生.

      1. You have a data race on b, i.e. undefined behavior, so anything can happen.

      您期望 divisor 线程将在 ratio 线程之前完成 pthread_setschedparam 调用 开始计算比率.

      You are expecting that the divisor thread will have finished pthread_setschedparam call before the ratio thread gets to computing the ratio.

      但绝对不能保证第一个线程不会在第二个线程创建之前很久就运行完成.

      But there is absolutely no guarantee that the first thread will not run to completion long before the second thread is even created.

      这确实是在 GDB 下可能发生的事情:它必须捕获线程创建和销毁事件以跟踪所有线程,因此在 GDB 下创建线程明显比在它之外慢.

      Indeed that is what's likely happening under GDB: it must trap thread creation and destruction events in order to keep track of all the threads, and so thread creation under GDB is significantly slower than outside of it.

      要解决第二个问题,添加一个计数信号量,并让两个线程在 各自执行 pthread_setschedparam 调用之后进行循环.

      To fix the second problem, add a counting semaphore, and have both threads randevu after each executed the pthread_setschedparam call.

      这篇关于Linux 不尊重 SCHED_FIFO 优先级?(正常或 GDB 执行)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆