Linux内核add_timer的可靠性只有一个小问题吗? [英] Reliability of Linux kernel add_timer at resolution of one jiffy?

查看:176
本文介绍了Linux内核add_timer的可靠性只有一个小问题吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面给出的代码中,有一个简单的Linux内核模块(驱动程序),它使用add_timer以1差的分辨率重复调用了10次函数(也就是说,定时器计划在jiffies + 1触发) ).使用bash脚本rerun.sh,然后从syslog的打印输出中获取时间戳,并使用gnuplot对其进行可视化.

In the code given below, there is a simple Linux kernel module (driver) which calls a function repeatedly 10 times, using add_timer at resolution of 1 jiffy (that is, the timer is scheduled to fire at jiffies + 1). Using the bash script rerun.sh, then I obtain timestamps from printout in syslog, and visualize them using gnuplot.

在大多数情况下,我会得到这样的syslog输出:

In most cases, I get a syslog output like this:

[ 7103.055787] Init testjiffy: 0 ; HZ: 250 ; 1/HZ (ms): 4
[ 7103.056044]  testjiffy_timer_function: runcount 1 
[ 7103.060045]  testjiffy_timer_function: runcount 2 
[ 7103.064052]  testjiffy_timer_function: runcount 3 
[ 7103.068050]  testjiffy_timer_function: runcount 4 
[ 7103.072053]  testjiffy_timer_function: runcount 5 
[ 7103.076036]  testjiffy_timer_function: runcount 6 
[ 7103.080044]  testjiffy_timer_function: runcount 7 
[ 7103.084044]  testjiffy_timer_function: runcount 8 
[ 7103.088060]  testjiffy_timer_function: runcount 9 
[ 7103.092059]  testjiffy_timer_function: runcount 10 
[ 7104.095429] Exit testjiffy

...这将产生如下时间序列和三角洲直方图:

... which results with time series and delta histogram plots like these:

从本质上讲,这是我期望从代码中获得计时的质量.

This is, essentially, the quality of timing that I'd expect from the code.

但是-每隔一段时间,我会得到类似以下内容的捕获:

However - every once in a while, I get a capture like:

[ 7121.377507] Init testjiffy: 0 ; HZ: 250 ; 1/HZ (ms): 4
[ 7121.380049]  testjiffy_timer_function: runcount 1 
[ 7121.384062]  testjiffy_timer_function: runcount 2 
[ 7121.392053]  testjiffy_timer_function: runcount 3 
[ 7121.396055]  testjiffy_timer_function: runcount 4 
[ 7121.400068]  testjiffy_timer_function: runcount 5 
[ 7121.404085]  testjiffy_timer_function: runcount 6 
[ 7121.408084]  testjiffy_timer_function: runcount 7 
[ 7121.412072]  testjiffy_timer_function: runcount 8 
[ 7121.416083]  testjiffy_timer_function: runcount 9 
[ 7121.420066]  testjiffy_timer_function: runcount 10 
[ 7122.417325] Exit testjiffy

...,其结果如下:

... which results with a rendering like:

...我就像:"WHOOOOOAAAAAA ...请稍等..."-序列中没有脉搏吗?是add_timer 错过一个插槽,然后在接下来的4 ms插槽中启动该功能?

... and I'm like: "WHOOOOOAAAAAA ... wait a second..." - isn't there a pulse dropped from the sequence? Meaning that add_timer missed a slot, and then fired up the function in the next 4 ms slot?

有趣的是,在运行这些测试时,除了启动终端,Web浏览器和文本编辑器之外,我什么都没有,所以我看不到正在运行的任何东西,这可能占用了OS/内核.因此,我真的看不到内核为什么会(在整个动荡的时期内)如此大的失误的原因.当我阅读有关Linux内核计时的信息时,例如"所有计时器中最简单,最不准确的就是计时器API ",我读到的最不准确"是:不要期望精确到 4毫秒"(按照此示例)-我不,我对(第一个)直方图中显示的方差;但我不希望错过整个时期!?

The interesting thing is, that in running these tests, I have nothing else but a terminal, web browser and a text editor started up - so I cannot really see anything running, that may hog the OS/kernel; and thus, I really cannot see a reason why the kernel would make such a big miss (of an entire jiffy period). When I read about Linux kernel timing, e.g. "The simplest and least accurate of all timers ... is the timer API", I read that "least accurate" as: "don't expect exactly 4 ms period" (as per this example) - and I don't, I'm fine with the variance shown in the (first) histogram; but I don't expect that a whole period will be missed!?

所以我的问题是:

  • 在这样的分辨率下,这是add_timer的预期行为吗(偶尔会错过一个时期)?
  • 如果是这样,是否有一种方法可以强制" add_timer在每个4ms插槽上触发该功能,如该平台上的一个jiffy所指定的那样?
  • 我是否有可能获得错误的"时间戳-例如时间戳反映何时向系统日志实际打印"发生,而不是函数实际触发时间?
  • 请注意,我并不是在寻找低于对应于抖动的周期分辨率(在这种情况下为4ms);当代码正常工作时,我也不想减少delta方差.因此,正如我所见,我没有高分辨率计时器"要求,也没有硬实时"要求-我只希望add_timer可靠地触发.在不依靠内核的特殊实时"配置的情况下,是否可以在该平台上实现?
  • Is this expected behavior from add_timer at this resolution (that a period can occasionally be missed)?
  • If so, is there a way to "force" add_timer to fire the function at each 4ms slot, as specified by a jiffy on this platform?
  • Is it possible that I get a "wrong" timestamp - e.g. the timestamp reflecting when the actual "print" to syslog happened, rather than when the function actually fired?
  • Note that I'm not looking for a period resolution below what corresponds to a jiffy (in this case, 4ms); nor am I looking to decrease the delta variance when the code works properly. So as I see it, I don't have "high resolution timer" demands, nor "hard real-time" demands - I just want add_timer to fire reliably. Would that be possible on this platform, without resorting to special "real-time" configurations of the kernel?

奖励问题:在下面的rerun.sh中,您会注意到两个标有MUSTHAVEsleep;如果其中任何一个被遗漏/注释,则操作系统/内核冻结,并且需要 hard 重新引导.而且我不明白为什么-从bash中insmod之后运行rmmod真的有可能这么快吗?,它将与正常的模块加载/卸载过程相冲突吗?

Bonus question: in rerun.sh below, you'll note two sleeps marked with MUSTHAVE; if either of them is left out/commented, OS/kernel freezes, and requires a hard reboot. And I cannot see why - is it really possible that running rmmod after insmod from bash is so fast, that it will conflict with the normal process of module loading/unloading?

平台信息:

$ cat /proc/cpuinfo | grep "processor\|model name\|MHz\|cores"
processor   : 0       # (same for 1)
model name  : Intel(R) Atom(TM) CPU N450   @ 1.66GHz
cpu MHz             : 1000.000
cpu cores   : 1
$ echo $(cat /etc/issue ; uname -a)
Ubuntu 11.04 \n \l Linux mypc 2.6.38-16-generic #67-Ubuntu SMP Thu Sep 6 18:00:43 UTC 2012 i686 i686 i386 GNU/Linux
$ echo $(lsb_release -a 2>/dev/null | tr '\n' ' ')
Distributor ID: Ubuntu Description: Ubuntu 11.04 Release: 11.04 Codename: natty

代码:

$ cd /tmp/testjiffy
$ ls
Makefile  rerun.sh  testjiffy.c

Makefile :

Makefile:

obj-m += testjiffy.o

all:
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

testjiffy.c :

testjiffy.c:

/*
 *  [http://www.tldp.org/LDP/lkmpg/2.6/html/lkmpg.html#AEN189 The Linux Kernel Module Programming Guide]
 */


#include <linux/module.h>   /* Needed by all modules */
#include <linux/kernel.h>   /* Needed for KERN_INFO */
#include <linux/init.h>     /* Needed for the macros */
#include <linux/jiffies.h>
#include <linux/time.h>
#define MAXRUNS 10

static volatile int runcount = 0;
static struct timer_list my_timer;

static void testjiffy_timer_function(unsigned long data)
{
  int tdelay = 100;

  runcount++;
  if (runcount == 5) {
    while (tdelay > 0) { tdelay--; } // small delay
  }

  printk(KERN_INFO
    " %s: runcount %d \n",
    __func__, runcount);

  if (runcount < MAXRUNS) {
    my_timer.expires = jiffies + 1;
    add_timer(&my_timer);
  }
}


static int __init testjiffy_init(void)
{
    printk(KERN_INFO
    "Init testjiffy: %d ; HZ: %d ; 1/HZ (ms): %d\n",
               runcount,      HZ,        1000/HZ);

  init_timer(&my_timer);

    my_timer.function = testjiffy_timer_function;
    //my_timer.data = (unsigned long) runcount;

  my_timer.expires = jiffies + 1;
    add_timer(&my_timer);
    return 0;
}

static void __exit testjiffy_exit(void)
{
    printk(KERN_INFO "Exit testjiffy\n");
}

module_init(testjiffy_init);
module_exit(testjiffy_exit);

MODULE_LICENSE("GPL");

rerun.sh :

rerun.sh:

#!/usr/bin/env bash

set -x
make clean
make
# blank syslog first
sudo bash -c 'echo "0" > /var/log/syslog'
sleep 1   # MUSTHAVE 01!
# reload kernel module/driver
sudo insmod ./testjiffy.ko
sleep 1   # MUSTHAVE 02!
sudo rmmod testjiffy
set +x

# copy & process syslog

max=0;
for ix in _testjiffy_*.syslog; do
  aa=${ix#_testjiffy_};
  ab=${aa%.syslog} ;
  case $ab in
    *[!0-9]*) ab=0;;          # reset if non-digit obtained; else
    *) ab=$(echo $ab | bc);;  # remove leading zeroes (else octal)
  esac
  if (( $ab > $max )) ; then
    max=$((ab));
  fi;
done;
newm=$( printf "%05d" $(($max+1)) );
PLPROC='chomp $_;
if (!$p) {$p=0;}; if (!$f) {$f=$_;} else {
  $a=$_-$f; $d=$a-$p;
  print "$a $d\n" ; $p=$a;
};'

set -x
grep "testjiffy" /var/log/syslog | cut -d' ' -f7- > _testjiffy_${newm}.syslog
grep "testjiffy_timer_function" _testjiffy_${newm}.syslog \
  | sed 's/\[\(.*\)\].*/\1/' \
  | perl -ne "$PLPROC" \
  > _testjiffy_${newm}.dat
set +x

cat > _testjiffy_${newm}.gp <<EOF
set terminal pngcairo font 'Arial,10' size 900,500
set output '_testjiffy_${newm}.png'
set style line 1 linetype 1 linewidth 3 pointtype 3 linecolor rgb "red"
set multiplot layout 1,2 title "_testjiffy_${newm}.syslog"
set xtics rotate by -45
set title "Time positions"
set yrange [0:1.5]
set offsets graph 50e-3, 1e-3, 0, 0
plot '_testjiffy_${newm}.dat' using 1:(1.0):xtic(gprintf("%.3se%S",\$1)) notitle with points ls 1, '_testjiffy_${newm}.dat' using 1:(1.0) with impulses ls 1
binwidth=0.05e-3
set boxwidth binwidth
bin(x,width)=width*floor(x/width) + width/2.0
set title "Delta diff histogram"
set style fill solid 0.5
set autoscale xy
set offsets graph 0.1e-3, 0.1e-3, 0.1, 0.1
plot '_testjiffy_${newm}.dat' using (bin(\$2,binwidth)):(1.0) smooth freq with boxes ls 1
unset multiplot
EOF
set -x; gnuplot _testjiffy_${newm}.gp ; set +x


的激励@granquet 发表评论,我尝试通过ddcall_usermodehelper/proc/schedstat/proc/sched_debug获取调度程序统计信息;请注意,大多数情况下会跳过"(也就是说,由于该函数的第7、6、10次运行而丢失的文件);但我设法获得了两次完整运行,并将它们发布在 https://gist.github.com/anonymous /5709699 (正如我所注意到的,要在SO上使用gistbin可能要比gist更为可取),因为输出量很大. *_11*文件记录正确的运行,*_17*文件记录滴"的运行.


Motivated by this comment by @granquet, I tried to obtain scheduler statistics from /proc/schedstat and /proc/sched_debug, by using dd through call_usermodehelper; note that this most of the time "skips" (that is, a file due to the 7th, or 6th, or Xth run of the function would be missing); but I managed to obtain two complete runs, and posted them in https://gist.github.com/anonymous/5709699 (as I noticed gist may be preferred to pastebin on SO), given the output is kinda massive; the *_11* files log a proper run, the *_17* files log a run with a "drop".

注意,我也在模块中切换到了mod_timer_pinned,它并没有多大帮助(要点日志是使用此功能通过模块获得的).这些是testjiffy.c中的更改:

Note I also switched to mod_timer_pinned in the module, and it doesn't help much (the gist logs are obtained with the module using this function). These are the changes in testjiffy.c:

#include <linux/kmod.h> // usermode-helper API
...
char fcmd[] = "of=/tmp/testjiffy_sched00";
char *dd1argv[] = { "/bin/dd", "if=/proc/schedstat", "oflag=append", "conv=notrunc", &fcmd[0], NULL };
char *dd2argv[] = { "/bin/dd", "if=/proc/sched_debug", "oflag=append", "conv=notrunc", &fcmd[0], NULL };
static char *envp[] = {
      "HOME=/",
      "TERM=linux",
      "PATH=/sbin:/bin:/usr/sbin:/usr/bin", NULL };

static void testjiffy_timer_function(unsigned long data)
{
  int tdelay = 100;
  unsigned long tjnow;

  runcount++;
  if (runcount == 5) {
    while (tdelay > 0) { tdelay--; } // small delay
  }

  printk(KERN_INFO
    " %s: runcount %d \n",
    __func__, runcount);

  if (runcount < MAXRUNS) {
    mod_timer_pinned(&my_timer, jiffies + 1);
    tjnow = jiffies;
    printk(KERN_INFO
      " testjiffy expires: %lu - jiffies %lu => %lu / %lu\n",
      my_timer.expires, tjnow, my_timer.expires-tjnow, jiffies);
    sprintf(fcmd, "of=/tmp/testjiffy_sched%02d", runcount);
    call_usermodehelper( dd1argv[0], dd1argv, envp, UMH_NO_WAIT );
    call_usermodehelper( dd2argv[0], dd2argv, envp, UMH_NO_WAIT );
  }
}

...以及rerun.sh中的此内容:

... and this in rerun.sh:

...
set +x

for ix in /tmp/testjiffy_sched*; do
  echo $ix | tee -a _testjiffy_${newm}.sched
  cat $ix >> _testjiffy_${newm}.sched
done
set -x ; sudo rm /tmp/testjiffy_sched* ; set +x

cat > _testjiffy_${newm}.gp <<EOF
...


我将使用此帖子进行详细答复.


I'll use this post for verbose replying.

@CL. :非常感谢您的回答.很高兴确认它是有可能在以后的麻烦中调用您的计时器函数";通过记录这些变化,我也意识到在以后的某个时间会调用timer函数-除此之外,它本身并没有什么错误".

@CL.: many thanks for the answer. Good to have it confirmed that it's "possible that your timer function gets called at a later jiffy"; by logging the jiffies, I too realized that the timer function gets called at a later time - and other than that, it doesn't anything "wrong" per se.

很高兴了解时间戳;我想知道是否有可能:计时器功能在正确的时间命中,但是内核抢占了内核日志记录服务(我相信它是klogd),所以我得到了延迟的时间戳?但是,我试图创建一个循环"(或者说是周期性的)定时器函数来写入硬件,并且我首先注意到这种下降",因为意识到PC不会在USB总线上以特定间隔写入数据.并且考虑到时间戳确认了这种行为,那么这可能不是问题所在(我想).

Good to know about the timestamps; I wonder if it is possible that: the timer functions hits at the right time, but the kernel preempts the kernel logging service (I believe it's klogd), so I get a delayed timestamp? However, I'm trying to create a "looped" (or rather, periodic) timer function to write to hardware, and I first noted this "drop" by realizing the PC doesn't write data at certain intervals on the USB bus; and given that the timestamps confirm that behavior, it's probably not the problem here (I guess).

我修改了计时器功能,因此它相对于上一个计时器的预定时间(my_timer.expires)触发-再次通过mod_timer_pinned而不是add_timer:

I have modified the timer function so it fires relative to the scheduled time of the last timer (my_timer.expires) - again via mod_timer_pinned instead of add_timer:

static void testjiffy_timer_function(unsigned long data)
{
  int tdelay = 100;
  unsigned long tjlast;
  unsigned long tjnow;

  runcount++;
  if (runcount == 5) {
    while (tdelay > 0) { tdelay--; } // small delay
  }

  printk(KERN_INFO
    " %s: runcount %d \n",
    __func__, runcount);

  if (runcount < MAXRUNS) {
    tjlast = my_timer.expires;
    mod_timer_pinned(&my_timer, tjlast + 1);
    tjnow = jiffies;
    printk(KERN_INFO
      " testjiffy expires: %lu - jiffies %lu => %lu / %lu last: %lu\n",
      my_timer.expires, tjnow, my_timer.expires-tjnow, jiffies, tjlast);
  }
}

...以及最初的几次尝试,它都完美无缺-但是,最终,我得到了:

... and the first few tries, it works impeccably - however, eventually, I get this:

[13389.775508] Init testjiffy: 0 ; HZ: 250 ; 1/HZ (ms): 4
[13389.776051]  testjiffy_timer_function: runcount 1 
[13389.776063]  testjiffy expires: 3272445 - jiffies 3272444 => 1 / 3272444 last: 3272444
[13389.780053]  testjiffy_timer_function: runcount 2 
[13389.780068]  testjiffy expires: 3272446 - jiffies 3272445 => 1 / 3272445 last: 3272445
[13389.788054]  testjiffy_timer_function: runcount 3 
[13389.788073]  testjiffy expires: 3272447 - jiffies 3272447 => 0 / 3272447 last: 3272446
[13389.788090]  testjiffy_timer_function: runcount 4 
[13389.788096]  testjiffy expires: 3272448 - jiffies 3272447 => 1 / 3272447 last: 3272447
[13389.792070]  testjiffy_timer_function: runcount 5 
[13389.792091]  testjiffy expires: 3272449 - jiffies 3272448 => 1 / 3272448 last: 3272448
[13389.796044]  testjiffy_timer_function: runcount 6 
[13389.796062]  testjiffy expires: 3272450 - jiffies 3272449 => 1 / 3272449 last: 3272449
[13389.800053]  testjiffy_timer_function: runcount 7 
[13389.800063]  testjiffy expires: 3272451 - jiffies 3272450 => 1 / 3272450 last: 3272450
[13389.804056]  testjiffy_timer_function: runcount 8 
[13389.804072]  testjiffy expires: 3272452 - jiffies 3272451 => 1 / 3272451 last: 3272451
[13389.808045]  testjiffy_timer_function: runcount 9 
[13389.808057]  testjiffy expires: 3272453 - jiffies 3272452 => 1 / 3272452 last: 3272452
[13389.812054]  testjiffy_timer_function: runcount 10 
[13390.815415] Exit testjiffy

...呈现如下:

...所以,基本上,我在+ 8ms插槽(应该为@ 3272446 jiffies)处有一个延迟/下降",然后在+ 12ms插槽(其中有两个函数)运行将是@ 3272447吉卜赛);您甚至可以因此看到该图上的标签为更粗体".从丢弃"序列现在已与适当的非丢弃序列同步的角度来看,这更好(这就是您所说的:以避免一个较晚的计时器函数转移所有随后的计时器调用")-但是,我仍然错过一个节拍;而且由于每次跳动都必须向硬件写入字节,因此我保持了持续且恒定的传输速率,所以不幸的是,这对我没有多大帮助.

... so, basically I have a delay/"drop" at +8ms slot (which should be @3272446 jiffies), and then have two functions run at the +12ms slot (which would be @ 3272447 jiffies); you can even see the label on the plot as "more bolded" because of it. This is better, in the sense of the "drop" sequence now being synchronous to a proper, non-drop sequence (which is as you said: "to avoid that one late timer function shifts all following timer calls") - however, I still miss a beat; and since I have to write bytes to hardware at each beat, so I keep a sustained, constant transfer rate, this unfortunately doesn't help me much.

关于另一个建议,使用十个计时器"-因为我的最终目标(使用定期lo-res计时器功能写入硬件);我以为一开始它并不适用-但如果没有其他办法(除了做一些特殊的实时内核准备工作),那我肯定会尝试一个有10个(或N个)计时器的方案(也许存储在数组),然后依次触发.

As for the other suggestion, to "use ten timers" - because of my ultimate goal (write to hardware using a periodic lo-res timer function); I thought at first it doesn't apply - but if nothing else is possible (other than doing some special real-time kernel preparations), then I'll certainly try a scheme where I have 10 (or N) timers (maybe stored in an array) which are fired periodically one after another.

仅添加剩余的相关评论:

just adding leftover relevant comments:

USB传输要么预先安排(同步),要么没有定时保证(异步).如果您的设备不使用同步传输,则设计错误. – CL. 6月5日,上午10:47

USB transfers are either scheduled in advance (isochronous) or have no timing guarantees (asynchronous). If your device doesn't use isochronous transfers, it's badly misdesigned. – CL. Jun 5 at 10:47

感谢您的评论@CL. -……提前预定(同步)……"消除了我的困惑.我(最终)瞄准的是FT232,它仅具有批量模式-只要每个计时器命中的字节数较低,我实际上就可以通过add_timer流式传输"数据来欺骗"我的方式.但是,当我传输接近消耗带宽的大量字节时,这些失火"开始变得明显.因此,我有兴趣测试其极限,为此我需要一个可靠的重复计时器"功能-我还能尝试使用可靠的计时器"吗? – sdaau 6月5日12:27

Thanks for the comment, @CL. - "... scheduled in advance (isochronous)..." cleared a confusion I had. I'm (eventually) targeting an FT232, which only has BULK mode - and as long as the bytes per timer hit is low, I can actually "cheat" my way through in "streaming" data with add_timer; however, when I transfer ammount of bytes close to consuming bandwidth, then these "misfires" start getting noticeable as drops. So I was interested in testing the limits of that, for which I need a reliably repetitive "timer" function - is there anything else I could try to have a reliable "timer"? – sdaau Jun 5 at 12:27

@sdaau批量传输不适合流式传输.您不能使用其他类型的软件计时器来解决硬件协议中的缺点. – CL. 6月5日13:50

@sdaau Bulk transfers are not suitable for streaming. You cannot fix shortcomings in the hardware protocol by using another kind of software timer. – CL. Jun 5 at 13:50

...并作为我对@CL的回复. :我知道我将无法解决缺点;我对观察的这些缺陷更感兴趣-例如,如果内核函数进行定期的USB写操作,则可以观察示波器/分析仪上的信号,并希望从某种意义上看,不适合使用批量模式.但是首先,我必须相信该功能可以(至少在某种程度上)可靠地以周期性的速率重复(即生成"一个时钟/滴答声)-而且直到现在,我才意识到我无法真正信任add_timer以jiffies分辨率(因为它可以相对容易地跳过整个周期).但是,从这种意义上讲,转向Linux的高分辨率计时器(hrtimer)确实可以为我提供可靠的周期性功能-因此,我想这可以解决我的问题(发布在我的

... and as my response to @CL. : I'm aware I wouldn't be able to fix shortcomings; I was more interested in observing these shortcomings - say, if a kernel function makes a periodic USB write, I could observe the signals on a scope/analyzer, and hopefully see in what sense is the bulk mode unsuitable. But first, I'd have to trust that the function can (at least somewhat) reliably repeat at a periodic rate (i.e. "generate" a clock/tick) - and I wasn't aware, until now, that I cannot really trust add_timer at jiffies resolution (as it is capable of relatively easily skipping a whole period). However, it seems that moving to Linux' high-resolution timers (hrtimer) does give me a reliable periodic function in this sense - so I guess that solves my problem (posted in my answer below).

推荐答案

非常感谢您的所有评论和答复;他们都指出了必须考虑的事情-但鉴于我有点永远,所以在获得一些了解之前,我仍然需要多读一些书(我希望是对的).另外,我找不到真正适合于定期滴答"功能的东西-因此,我将在此处发布更详细的答案.

Many thanks for all the comments and answers; they all pointed to things that must be taken into account - but given I'm somewhat of a forever noob, I still needed to do some more reading, before gaining some understanding (I hope a correct one). Also, I couldn't really find anything specific for periodically "ticking" functions - so I'll post a more verbose answer here.

简而言之-为了获得可靠的周期性Linux内核功能,且分辨率不高,请不要使用add_timer(<linux/time.h>),因为它可能会丢掉"整个周期;请改用高分辨率计时器(<linux/hrtimer.h>).更详细地:

In brief - for a reliable periodic Linux kernel function at a resolution of a jiffy, do not use add_timer (<linux/time.h>), as it may "drop" an entire period; use high-resolution timers (<linux/hrtimer.h>) instead. In more detail:

我是否有可能获得错误的时间戳-...?

Is it possible that I get a "wrong" timestamp - ...?

@CL. :日志中的时间戳是该字符串被打印到日志中的时间.

@CL.: The timestamp in the log is the time when that string was printed to the log.

所以,也许有可能-但事实证明,这不是这里的问题:

So, maybe it's possible - but it turns out, that's not the problem here:

在这种分辨率下,add_timer的预期行为是这样吗(偶尔会遗漏一个句点)?

Is this expected behavior from add_timer at this resolution (that a period can occasionally be missed)?

我想事实证明-是的:

如果是这样,是否有一种方法可以强制" add_timer在每个4ms插槽上触发该功能,如该平台上的一个jiffy所指定的那样?

If so, is there a way to "force" add_timer to fire the function at each 4ms slot, as specified by a jiffy on this platform?

...然后(我再猜),结果-不.

... and (I guess again), it turns out - no.

现在,造成这种情况的原因有些微妙-我希望,如果我没有正确理解它们,那么有人会纠正我.首先,我的第一个误解是时钟只是一个时钟"(在某种意义上:即使将其实现为计算机代码),但这也不是完全正确的.每次使用add_timer之类的内核时,内核基本上都必须在某个地方排队"事件".并且此请求可能真正来自任何事物:来自任何(和所有)种类的驱动程序,甚至可能来自用户空间.

Now, the reasons for this are somewhat subtle - and I hope if I didn't get them right, someone will correct me. First of all, the first misconception that I had, was that "a clock is just a clock" (in the sense of: even if it is implemented as computer code) - but that is not quite correct. The kernel basically has to "queue" an "event" somewhere, each time something like add_timer is used; and this request may come from anything really: from any (and all) sort(s) of driver(s), or even possibly userspace.

问题在于,这种排队"成本很高-因为内核除了必须处理(相当于)遍历并在数组中插入(和删除)项之外,还必须处理跨越几个阶次的计时器延迟.大小(从几毫秒到大约10秒);某些驱动程序(显然是用于网络协议的驱动程序)显然会排队很多计时器事件,通常在运行之前将它们取消取消,而其他类型的驱动程序可能需要完全不同的行为 (就像我的情况一样-在周期性函数中,您希望大多数情况下事件通常不会被取消;并且您也将事件逐个排队)..最重要的是,内核需要针对单处理器,SMP和多处理器平台进行处理.因此,在内核中实现计时器处理涉及成本效益折衷.

The problem is that this "queuing" costs - since in addition to the kernel having to handle (the equivalent of) traversing and inserting (and removing) items in an array, it also has to handle timer delays spanning several orders of magnitude (from say milliseconds to maybe 10s of seconds); and the fact that some drivers (like, apparently, those for network protocols) apparently queue a lot of timer events, which are usually cancelled before running - while other types may require a completely different behavior (like in my case - in a periodic function, you expect that most of the time, the event will usually not be cancelled; and you also queue the events one by one). On top of that, the kernel needs to handle this for uniprocessor vs. SMP vs. multiprocessor platforms. Thus, there is a cost-benefit tradeoff involved in implementing timer handling in the kernel.

事实证明,围绕jiffies/add_timer的体系结构旨在处理最常见的设备-对于它们而言,以jiffy为分辨率的精度不是问题.但这也意味着使用此方法无法以一个简单的分辨率实现可靠的计时器.内核通过将这些事件队列"(某种程度上)像中断服务请求(IRQ)一样处理来处理这些事件,这也使情况更加复杂.并且内核中的IRQ处理有多个优先级,其中较高优先级的例程可以抢占较低优先级的例程(即:即使在当时正在执行,也要中断并挂起较低优先级的例程-和允许较高优先级的例程开始其业务).或者,如前所述:

It turns out, the architecture around jiffies/add_timer is designed to handle the most common devices - and for them, precision at a resolution of a jiffy is not an issue; but this also means that one cannot expect a reliable timer at resolution of a single jiffy with this method. This is also compounded by the fact that the kernel handles these "event queues" by treating them (somewhat) like interrupt service requests (IRQ); and that there are several levels of priority in IRQ handling in the kernel, where higher priority routine can pre-empt a lower priority one (that is: interrupt and suspend a lower priority routine, even if it is being executed at the time - and allow the higher priority routine to go about its business). Or, as previously noted:

@granquet :计时器在软irq上下文中运行,这意味着它们具有最高优先级,并且抢占了CPU上所有可运行/可运行的内容...但是在服务于软irq时不会禁用硬件中断.因此,(可能是最可能的解释)您可能会在某处获得硬件中断,从而抢占计时器...从而获得在正确的时间未得到服务的中断.

@granquet: timers run in soft irq context, which means they have the highest priority and they preempt everything running/runnable on the CPU ... but hardware interrupts which are not disabled when servicing a soft irq. So you might (most probable explanation) get an Hardware interrupt here and there that preempts your timer ... and thus you get an interrupt that is not serviced at the right time.

@CL. :确实有可能在比设置的过期时间晚的时候调用您的timer函数.可能的原因是调度延迟,其他禁用中断时间过长的驱动程序(图形和WLAN驱动程序通常是罪魁祸首)或某些执行SMI代码的糟糕的BIOS.

@CL.: It is indeed possible that your timer function gets called at a later jiffy than what expires what set to. Possible reasons are scheduling delays, other drivers that disable interrupts for too long (graphics and WLAN drivers are usual culprits), or some crappy BIOS executing SMI code.

我现在也这样认为-我认为这可以说明发生的情况:

I now think so, too - I think this could be an illustration of what happens:

  • jiffies更改为10000(== 40000 ms @ 250 Hz)
  • 假设计时器功能(由add_timer排队)即将开始运行-但尚未开始运行
  • 在这里说,网卡生成(出于某种原因)硬件中断
  • 具有较高优先级的硬件中断会触发内核抢占(停止和挂起)计时器功能(可能现在已启动,并且其中仅有几条指令);
  • 这意味着内核现在必须 rechedule 定时器函数,才能在以后运行-而且由于只能在内核中使用整数运算,因此此类事件的时间分辨率为在jiffies中-最好的办法是将其重新安排为jiffies + 1(10001 == 40004 ms @ 250 Hz)
  • 现在内核将上下文切换到网卡驱动程序的IRQ服务例程,并开始其业务
  • 让我们说IRQ服务例程在200μs内完成-这意味着我们现在(以绝对"术语)为40000.2 ms-但是,我们也仍然以10,000吉格的速度保持静止状态.
  • 如果内核现在将上下文 back 切换为计时器功能,则它会完成-不必让我注意到延迟;
  • ...但是,这不会发生,因为定时器功能是为下一个捷足而安排的!
  • 因此,内核在大约3.8毫秒的时间内继续其业务(可能处于休眠状态)
  • jiffies更改为10001(== 40004 ms @ 250 Hz)
  • (先前重新安排的)计时器功能运行-这次完成而不会中断
  • jiffies changes to, say, 10000 (== 40000 ms @250 Hz)
  • Let's say the timer function, (queued by add_timer) is about to start running - but hasn't started running yet
  • Let's say here, the network card generates (for whatever reason) a hardware interrupt
  • The hardware interrupt, having a higher priority, triggers the kernel to pre-empt (stop and suspend) the timer function (possibly started by now, and just few instructions in);
  • That means the kernel now has to reschedule the timer function, to run at a later point - and since one only works with integer operations in the kernel, and time resolution for this kind of event is in jiffies - the best it can do is reschedule it for jiffies+1 (10001 == 40004 ms @250 Hz)
  • Now the kernel switches the context to the IRQ service routine of the network card driver, and it goes about its business
  • Let's say the IRQ service routine completes in 200 μs - that means now we're (in "absolute" terms) at 40000.2 ms - however, we are also still at 10000 jiffies
  • If the kernel now switched the context back to the timer function, it would have completed - without me necessarily noticing the delay;
  • ... however, that will not happen, because the timer function is scheduled for the next jiffy!
  • So kernel goes about its business (possibly sleeping) for the next approx 3.8 ms
  • jiffies changes to 10001 (== 40004 ms @250 Hz)
  • (the previously rescheduled) timer function runs - and this time completes without interruption

我还没有进行详细的分析来确定事件序列是否完全如上所述.但是我非常有说服力,这很接近-换句话说,就是分辨率问题-尤其是因为高分辨率计时器方法似乎没有表现出这种现象.确实,获得调度程序日志并确切地知道是什么导致了抢先是什么,这确实很棒,但是我怀疑在OP编辑中尝试响应的用户空间往返操作,以响应 @granquet '的评论,是正确的选择.

I haven't really done a detailed analysis to see if the sequence of events is exactly as described above; but I'm quite persuaded that it is something close - in other words, a resolution problem - especially since the high-resolution timer approach seems to not show this behavior. It would be great indeed, to obtain a scheduler log, and know exactly what happened to cause a pre-empt - but I doubt the roundtrip to userspace, which I attempted in OP edit, in response to @granquet's comment, is the right thing to do.

无论如何,请返回到此:

In any case, going back to this:

请注意,我并不是在寻找低于对应于抖动的周期分辨率(在这种情况下为4ms);当代码正常工作时,我也不想减少delta方差.因此,据我所知,我没有高分辨率计时器"要求,也没有硬实时"要求...

Note that I'm not looking for a period resolution below what corresponds to a jiffy (in this case, 4ms); nor am I looking to decrease the delta variance when the code works properly. So as I see it, I don't have "high resolution timer" demands, nor "hard real-time" demands ...

...这是我犯的一个严重错误-如上述分析所示,我没有有高分辨率"要求!而且,如果我意识到早些时候,我可能会早些找到相关的阅读资料.无论如何,对我来说,一些相关文档-即使他们没有专门讨论周期性函数-

... here was a bad mistake I made - as the analysis above shows, I did have "high resolution" demands! And had I realized that earlier, I may have found relevant reading sooner. Anyways, some relevant docs - even if they don't discuss specifically periodic functions - for me, were:

  • LDD3:5.3.信号灯和互斥体-(在此处描述具有不同要求的驱动程序):"不会从中断处理程序或其他异步上下文进行访问.没有特定的延迟(响应时间)要求; 应用程序员理解,通常不会立即满足I/O请求 "
  • Documentation/timers/hrtimers.txt - " timers.c代码围绕跳动"和32位假设进行了非常紧密的编码",并且针对相对较窄的用例(相对较窄的HZ范围内的跳动)进行了微调和微优化-因此,即使对其进行很小的扩展,也很容易破坏车轮概念"
  • T. Gleixner,D. Niehaus Hrtimers及以后:改造Linux时间子系统(pdf)-(最详细的信息,另请参见内部图)"级联计时器轮(CTW)(于1997年实施)替换了原来的按时间排序的双链表以解决链表的O(N)插入时间的可伸缩性问题... Linux中当前的计时器管理方法可以很好地满足非常广泛的要求,但是在某些情况下它不能提供所需的服务质量因为它必须满足如此广泛的要求...与超时相关的计时器保留在现有计时器轮中,并且为(高分辨率)计时器要求优化了新的子系统hrtimers得以实施.hrtimers完全基于人工时间(单位:纳秒)...将它们保存在按时间排序的每个CPU列表中,实现为红色-黑色k树."
  • 高分辨率计时器API [LWN.net] -" hrtimer机制的核心保持不变,hrtimers而不是使用"timer wheel"数据结构,而是位于按时间排序的链表上,而下一个到期的计时器位于该表的顶部. /black tree也可用于在不扫描列表的情况下插入和删除计时器事件.但是尽管核心保持不变,但几乎所有其他方面都发生了变化,至少是表面上发生了变化."
  • 软件中断和实时[LWN.net] -" softirq机制旨在处理几乎与硬件中断同样重要的处理,但softirq的运行优先级较高(尽管有一个有趣的异常,如下所述),但启用了硬件中断.通常会抢占除对实际"硬件中断的响应之外的任何工作.但是从3.0实时补丁集开始,该功能就消失了.作为响应,在3.6.1-rt1中,softirqs的处理已更改再次."
  • 高分辨率(但不太高)的超时[LWN.net] -"_ poll()和epoll_wait()花费整数毫秒; select()接受具有微秒分辨率的struct timeval,而ppoll()和pselect()接受具有纳秒分辨率的struct timespec.它们都是一样的,但是,由于它们会将超时值转换为jiffies,最大分辨率在1到10毫秒之间,因此程序员可能会以10纳秒的超时时间对pselect()调用进行编程,但该调用可能要等到10毫秒后才会返回,即使在无需争用CPU....这是一个有用的功能,但要付出一些重要的API更改的代价._"
  • LDD3: 5.3. Semaphores and Mutexes - (in describing a driver with different demands from here): "no accesses will be made from interrupt handlers or other asynchronous contexts. There are no particular latency (response time) requirements; application programmers understand that I/O requests are not usually satisfied immediately"
  • Documentation/timers/hrtimers.txt - "The timers.c code is very "tightly coded" around jiffies and 32-bitness assumptions, and has been honed and micro-optimized for a relatively narrow use case (jiffies in a relatively narrow HZ range) for many years - and thus even small extensions to it easily break the wheel concept"
  • T. Gleixner, D. Niehaus Hrtimers and Beyond: Transforming the Linux Time Subsystems (pdf) - (most detailed, see also diagrams inside) "The Cascading Timer Wheel (CTW), which was implemented in 1997, replaced the original time ordered double linked list to resolve the scalability problem of the linked list's O(N) insertion time... The current approach to timer management in Linux does a good job of satisfying an extremely wide range of requirements, but it cannot provide the quality of service required in some cases precisely because it must satisfy such a wide range of requirements... The timeout related timers are kept in the existing timer wheel and a new subsystem optimized for (high resolution) timer requirements hrtimers was implemented. hrtimers are entirely based on human time (units: nanoseconds)... They are kept in a time sorted, per-CPU list, implemented as a red-black tree."
  • The high-resolution timer API [LWN.net] - "At its core, the hrtimer mechanism remains the same. Rather than using the "timer wheel" data structure, hrtimers live on a time-sorted linked list, with the next timer to expire being at the head of the list. A separate red/black tree is also used to enable the insertion and removal of timer events without scanning through the list. But while the core remains the same, just about everything else has changed, at least superficially."
  • Software interrupts and realtime [LWN.net] - "The softirq mechanism is meant to handle processing that is almost — but not quite — as important as the handling of hardware interrupts. Softirqs run at a high priority (though with an interesting exception, described below), but with hardware interrupts enabled. They thus will normally preempt any work except the response to a "real" hardware interrupt... Starting with the 3.0 realtime patch set, though, that capability went away... In response, in 3.6.1-rt1, the handling of softirqs has changed again."
  • High- (but not too high-) resolution timeouts [LWN.net] - "_poll() and epoll_wait() take an integer number of milliseconds; select() takes a struct timeval with microsecond resolution, and ppoll() and pselect() take a struct timespec with nanosecond resolution. They are all the same, though, in that they convert this timeout value to jiffies, with a maximum resolution between one and ten milliseconds. A programmer might program a pselect() call with a 10 nanosecond timeout, but the call may not return until 10 milliseconds later, even in the absence of contention for the CPU. ... It's a useful feature, but it comes at the cost of some significant API changes._"

从引号中可以清楚地看到,内核中的高分辨率定时功能仍在积极开发中(随着API的更改)-我担心,也许我必须安装一个特殊的实时"功能.补丁"内核.幸运的是,高分辨率计时器似乎在我的2.6.38-16 SMP内核中可用(并且正在工作),而无需进行任何特殊更改.以下是修改后的testjiffies.c内核模块的列表,该模块现在使用高分辨率计时器,但其他时间与jiffies确定的周期相同.为了进行测试,我使它循环了200次(而不是OP中的10次).并运行rerun.sh脚本大约20-30次,这是我得到的最糟糕的结果:

One thing clear from the quotes, is that high-resolution timing facilities are still under active development (with API changes) in the kernel - and I was afraid, that maybe I'd have to install a special "real-time patch" kernel. Thankfully, high-resolution timers are seemingly available (and working) in my 2.6.38-16 SMP kernel without any special changes. Below is the listing of the modified testjiffies.c kernel module, which now uses high-resolution timers, but otherwise keeps the same period as determined by jiffies. For testing, I made it loop for 200 times (instead of 10 in the OP); and running the rerun.sh script for some 20-30 times, this is the worst result I got:

时间序列现在显然不可读,但直方图仍可以告诉我们:最大偏差取0.00435-0.004(= 0.004-0.00365)= 350μs,它仅表示100 *(350/4000)= 8.75预期期限的百分比;我当然没有问题.另外,我从没掉线(或相应地,整个2 *周期= 8毫秒延迟)或0毫秒延迟-我得到的捕获的质量与第一幅图像所示的相同.在OP中.现在,我当然可以进行更长的测试,并更准确地看到它的可靠性-但这就是我希望/需要的对于这种简单情况的所有可靠性.与OP相反,在OP中,我只有10个循环就下降了,并且有抛硬币的可能性-rerun.sh脚本的每一秒或第三次运行,即使在较低的情况下,我也会下降操作系统资源使用情况!

The time sequence is now obviously unreadable, but the histogram can still tell us this: taking 0.00435-0.004 (= 0.004-0.00365) = 350 μs for the max deviation, it represents only 100*(350/4000) = 8.75% of the expected period; which I certainly don't have a problem with. Additionally, I never got a drop (or correspondingly, an entire 2*period = 8 ms delay), or a 0 ms delay - the captures I got, are otherwise of the quality shown on the first image in OP. Now, of course I could run a longer test and see more precisely how reliable it is - but this is all the reliability I'd expect/need to see for this simple case; contrast that to the OP, where I'd get a drop in just 10 loops, with the probability of tossing a coin - every second or third run of the rerun.sh script, I'd get a drop - even in context of low OS resource usage!

最后,请注意下面的源应该有问题,由您的模块有问题:必须确保计时器在模块卸载前没有挂起",已修正(在hrtimer的上下文中).这似乎回答了我的奖励问题,因为它消除了rerun.sh脚本中对"MUSTHAVE" sleep的需求.但是,请注意,由于200次循环@ 4毫秒耗时0.8 s-如果要捕获200个完整的滴答声,则需要insmodrmmod之间的sleep(否则,在我的机器上,我仅捕捉了7个滴答声).

Finally, note that the source below should have the problem, spotted by @CL.: "Your module is buggy: you must ensure that the timer is not pending before the module is unloaded", fixed (in the context of hrtimer). This seemingly answers my bonus question, as it obviates the need for either of the "MUSTHAVE" sleeps in the rerun.sh script. However, note that as 200 loops @ 4 ms take 0.8 s - the sleep between insmod and rmmod is needed if we want a full 200 tick capture (otherwise, on my machine, I get only some 7 ticks captured).

好吧,希望我现在就知道了(至少是大多数情况下)-如果没有,欢迎更正.:)

Well, hope I got this right now (at least most if it) - if not, corrections are welcome :)

testjiffy(-hr).c

testjiffy(-hr).c

#include <linux/module.h>   /* Needed by all modules */
#include <linux/kernel.h>   /* Needed for KERN_INFO */
#include <linux/init.h>     /* Needed for the macros */
#include <linux/jiffies.h>
#include <linux/time.h>
#define MAXRUNS 200

#include <linux/hrtimer.h>


static volatile int runcount = 0;

//~ static struct timer_list my_timer;
static unsigned long period_ms;
static unsigned long period_ns;
static ktime_t ktime_period_ns;
static struct hrtimer my_hrtimer;


//~ static void testjiffy_timer_function(unsigned long data)
static enum hrtimer_restart testjiffy_timer_function(struct hrtimer *timer)
{
  int tdelay = 100;
  unsigned long tjnow;
  ktime_t kt_now;
  int ret_overrun;

  runcount++;
  if (runcount == 5) {
    while (tdelay > 0) { tdelay--; } // small delay
  }

  printk(KERN_INFO
    " %s: runcount %d \n",
    __func__, runcount);

  if (runcount < MAXRUNS) {
    tjnow = jiffies;
    kt_now = hrtimer_cb_get_time(&my_hrtimer);
    ret_overrun = hrtimer_forward(&my_hrtimer, kt_now, ktime_period_ns);
    printk(KERN_INFO
      " testjiffy jiffies %lu ; ret: %d ; ktnsec: %lld \n",
      tjnow, ret_overrun, ktime_to_ns(kt_now));
    return HRTIMER_RESTART;
  }
  else return HRTIMER_NORESTART;
}


static int __init testjiffy_init(void)
{
  struct timespec tp_hr_res;
  period_ms = 1000/HZ;
  hrtimer_get_res(CLOCK_MONOTONIC, &tp_hr_res);
  printk(KERN_INFO
    "Init testjiffy: %d ; HZ: %d ; 1/HZ (ms): %ld ; hrres: %lld.%.9ld\n",
               runcount,      HZ,        period_ms, (long long)tp_hr_res.tv_sec, tp_hr_res.tv_nsec );

  hrtimer_init(&my_hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
  my_hrtimer.function = &testjiffy_timer_function;
  period_ns = period_ms*( (unsigned long)1E6L );
  ktime_period_ns = ktime_set(0,period_ns);
  hrtimer_start(&my_hrtimer, ktime_period_ns, HRTIMER_MODE_REL);

  return 0;
}

static void __exit testjiffy_exit(void)
{
  int ret_cancel = 0;
  while( hrtimer_callback_running(&my_hrtimer) ) {
    ret_cancel++;
  }
  if (ret_cancel != 0) {
    printk(KERN_INFO " testjiffy Waited for hrtimer callback to finish (%d)\n", ret_cancel);
  }
  if (hrtimer_active(&my_hrtimer) != 0) {
    ret_cancel = hrtimer_cancel(&my_hrtimer);
    printk(KERN_INFO " testjiffy active hrtimer cancelled: %d (%d)\n", ret_cancel, runcount);
  }
  if (hrtimer_is_queued(&my_hrtimer) != 0) {
    ret_cancel = hrtimer_cancel(&my_hrtimer);
    printk(KERN_INFO " testjiffy queued hrtimer cancelled: %d (%d)\n", ret_cancel, runcount);
  }
  printk(KERN_INFO "Exit testjiffy\n");
}

module_init(testjiffy_init);
module_exit(testjiffy_exit);

MODULE_LICENSE("GPL");

这篇关于Linux内核add_timer的可靠性只有一个小问题吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆