为什么每秒进行一次非自愿上下文切换? [英] Why one non-voluntary context switch per second?

查看:234
本文介绍了为什么每秒进行一次非自愿上下文切换?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

操作系统为RHEL 6(2.6.32).我隔离了一个核心,并在其上运行了计算密集型线程. /proc/{thread-id}/status每秒钟显示一次非自愿上下文切换.

有问题的线程是SCHED_NORMAL线程,我不想更改它.

如何减少非自愿上下文切换的数量?这是否取决于/proc/sys/kernel中的任何调度参数?

编辑:一些答复提出了替代方法.在走那条路线之前,我第一想了解为什么即使在整个运行小时内,我每秒都会得到一个完全非自愿的上下文切换.例如,这是由CFS引起的吗?如果是这样,哪些参数以及如何使用?

EDIT2 :进一步澄清-我想回答的第一个问题是:为什么我每秒获得一个非自愿上下文切换,而不是每半或两个切换一次秒?

解决方案

这是一个猜测,但是有根据的猜测-由于您使用隔离的CPU,因此调度程序不会调度任何任务,除了您自己的任务外,除了一个例外-内核中的vmstat代码具有一个计时器,该计时器每秒在每个CPU上调度一个工作队列项,以计算内存使用情况统计信息,这就是您看到的每秒进行调度的情况.

如果核心是100%空闲的,那么工作队列代码足够聪明,不会调度工作队列内核线程,但是如果它正在运行单个任务,则不会安排工作队列内核线程.

您可以使用 ftrace 进行验证.如果sched_switch跟踪器显示您每秒大约切换一次的实体(该值四舍五入到最近的jiffie事件,并且当cpu空闲时计时器不计数,因此这可能会使时间产生偏差)是events/CPU_NUMBER任务(对于较旧的内核,则为keventd),那么几乎100%的原因确实是 vmstat_update 函数将其计时器设置为在事件内核线程运行的每一秒对工作队列项进行排队.

请注意,vmstat设置其计时器的周期是可配置的-您可以通过vm.stat_interval sysctl 旋钮.增大此值将使您的中断发生率降低,但会降低内存使用情况统计信息的准确性.

我维护了一个Wiki,其中包含此处的所有中断源.我还在工作中有一个补丁,如果一个vmstat工作队列与下一个vmstat工作队列之间没有变化,则使vmstat不调度工作队列项-例如,如果您在CPU上的单个任务不使用任何动态内存,则会发生这种情况分配.不过,不确定是否会给您带来好处-这取决于您的工作量.

The OS is RHEL 6 (2.6.32). I have isolated a core and am running a compute intensive thread on it. /proc/{thread-id}/status shows one non-voluntary context switch every second.

The thread in question is a SCHED_NORMAL thread and I don't want to change this.

How can I reduce this number of non-voluntary context switches? Does this depend on any scheduling parameters in /proc/sys/kernel?

EDIT: Several responses suggest alternative approaches. Before going that route, I first want to understand why I am getting exactly one non-voluntary context switch per second even over hours of run. For example, is this caused by CFS? If so, which parameters and how?

EDIT2: Further clarification - first question I would like an answer to is the following: Why am I getting one non-voluntary context switch per second instead of, say, one switch every half or two seconds?

解决方案

This is a guess, but an educated one - since you use an isolated CPU the scheduler does not schedule any task except your own on it with one exception - the vmstat code in the kernel has a timer that schedules a single work queue item on each CPU once per second to calculate memory usage statistics and this is what you are seeing gets scheduled each second.

The work queue code is smart enough to not schedule the work queue kernel thread if the core is 100% idle but not if it is running a single task.

You can verify this using ftrace. If the sched_switch tracer shows that the entity you switch to once every second or so (the value is rounded to the nearest jiffie events and the timer does not count when the cpu is idle so this might skew the timing) is the events/CPU_NUMBER task (or keventd for older kernels), then it's almost 100% that the cause is indeed the vmstat_update function setting its timer to queue a work queue item every second which the events kernel thread runs.

Note that the cycle at which vmstat sets its timer is configurable - you can set it to other value via the vm.stat_interval sysctl knob. Increasing this value will give you a lower rate of such interruptions at the cost of less accurate memory usage statistics.

I maintain a wiki with all the sources of interruptions to isolated CPU work loads here. I also have a patch in the works for getting vmstat to not schedule the work queue item if there is no change between one vmstat work queue run to the next - such as would happen if your single task on the CPU does not use any dynamic memory allocations. Not sure it will benefit you, though - it depends on your work load.

这篇关于为什么每秒进行一次非自愿上下文切换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆