为什么在Linux内核中忙于循环时,进程被剥夺了CPU时间太长? [英] Why processes are deprived of CPU for TOO long while busy looping in Linux kernel?

查看:92
本文介绍了为什么在Linux内核中忙于循环时,进程被剥夺了CPU时间太长?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

乍一看,我的问题可能看起来有些琐碎.请忍受我,并完整阅读.

At first glance, my question might look bit trivial. Please bear with me and read completely.

我在我的Linux内核模块中发现了一个忙循环.因此,其他进程(例如sshd)在很长一段时间内(例如20秒)都无法获得CPU时间.这是可以理解的,因为我的机器只有一个CPU,繁忙循环并没有安排其他进程的机会.

I have identified a busy loop in my Linux kernel module. Due to this, other processes (e.g. sshd) are not getting CPU time for long spans of time (like 20 seconds). This is understandable as my machine has only single CPU and busy loop is not giving chance to schedule other processes.

为了实验,我在忙循环中的每次迭代之后都添加了schedule().即使这将使CPU处于繁忙状态,它仍应允许其他进程在我调用schedule()时运行.但是,这似乎没有发生.我的用户级进程仍挂了很长时间(20秒).

Just to experiment, I had added schedule() after each iteration in the busy loop. Even though, this would be keeping the CPU busy, it should still let other processes run as I am calling schedule(). But, this doesn't seem to be happening. My user level processes are still hanging for long spans of time (20 seconds).

在这种情况下,内核线程的优先级值为-5,用户级线程的优先级值为0.即使用户级线程的优先级较低,我认为20秒也太长,无法获得CPU.

In this case, the kernel thread got nice value -5 and user level threads got nice value 0. Even with low priority of user level thread, I think 20 seconds is too long to not get CPU.

有人可以解释为什么会这样吗?

Can someone please explain why this could be happening?

注意:我知道如何完全删除繁忙循环.但是,我想在这里了解内核的行为.内核版本为2.6.18,并且禁用了内核抢占.

Note: I know how to remove busy loop completely. But, I want to understand the behaviour of kernel here. Kernel version is 2.6.18 and kernel pre-emption is disabled.

推荐答案

schedule()函数只是调用调度程序-无需采取任何特殊措施来安排将调用线程替换为另一个线程.如果当前线程仍然是运行队列上的最高优先级,那么调度程序将再次选择它.

The schedule() function simply invokes the scheduler - it doesn't take any special measures to arrange that the calling thread will be replaced by a different one. If the current thread is still the highest priority one on the run queue then it will be selected by the scheduler once again.

听起来好像您的内核线程在其繁忙循环中做的工作很少,并且每次都在调用schedule().因此,它本身可能不会占用太多CPU时间,因此不会降低其优先级.负数nice值比正数具有更大的权重,因此,-5和0之间的差异非常明显.这两种效果的结合意味着用户空间进程错过了我并不感到惊讶.

It sounds as if your kernel thread is doing very little work in its busy loop and it's calling schedule() every time round. Therefore, it's probably not using much CPU time itself and hence doesn't have its priority reduced much. Negative nice values carry heavier weight than positives, so the difference between a -5 and a 0 is quite pronounced. The combination of these two effects means I'm not too surprised that user space processes miss out.

作为实验,您可以尝试在循环的第N次迭代中调用调度程序(您必须进行实验才能找到适合您平台的N值),看看情况是否更好-经常调用schedule()在调度程序中只会浪费大量的CPU时间.当然,这只是一个实验-正如您已经指出的那样,在生产代码中避免繁忙的循环是正确的选项,并且如果要确保将线程替换为另一个线程,请在调用之前将其设置为TASK_INTERRUPTIBLE schedule()从运行队列中进行远程远程控制(如注释中已经提到的那样).

As an experiment you could try calling the scheduler every Nth iteration of the loop (you'll have to experiment to find a good value of N for your platform) and see if the situation is better - calling schedule() too often will just waste lots of CPU time in the scheduler. Of course, this is just an experiment - as you have already pointed out, avoiding busy loops is the correct option in production code, and if you want to be sure your thread is replaced by another then set it to be TASK_INTERRUPTIBLE before calling schedule() to remote itself from the run queue (as has already been mentioned in comments).

请注意,您的内核(2.6.18)使用的是O(1)调度程序,该调度程序一直存在到完全在2.6.23中添加了Fair Scheduler (在2.6中添加了O(1)Scheduler,以替换甚至更老的

Note that your kernel (2.6.18) is using the O(1) scheduler which existed until the Completely Fair Scheduler was added in 2.6.23 (the O(1) scheduler having been added in 2.6 to replace the even older O(n) scheduler). The CFS doesn't use run queues and works in a different way, so you might well see different behaviour - I'm less familiar with it, however, so I wouldn't like to predict exactly what differences you'd see. I've seen enough of it to know that "completely fair" isn't the term I'd use on heavily loaded SMP systems with a large number of both cores and processes, but I also accept that writing a scheduler is a very tricky task and it's far from the worst I've seen, and I've never had a significant problem with it on a 4-8 core desktop machine.

这篇关于为什么在Linux内核中忙于循环时,进程被剥夺了CPU时间太长?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆