如何尽快输出固定缓冲区? [英] How to output as fast as possible a fixed buffer?
问题描述
示例代码:
#include <stdio.h>
#include <unistd.h>
#include <sched.h>
#include <pthread.h>
int
main (int argc, char **argv)
{
unsigned char buffer[128];
char buf[0x4000];
setvbuf (stdout, buf, _IOFBF, 0x4000);
fork ();
fork ();
pthread_t this_thread = pthread_self ();
struct sched_param params;
params.sched_priority = sched_get_priority_max (SCHED_RR);
pthread_setschedparam (this_thread, SCHED_RR, ¶ms);
while (1)
{
fwrite (&buffer, 128, 1, stdout);
}
}
该程序打开4个线程,并在stdout上输出缓冲区"的内容,该内容是64位cpu上的128个字节或16个长整数.
This program opens 4 threads and outputs on stdout the contents of "buffer" which is 128 bytes or 16 long ints on a 64 bit cpu.
如果我随后运行:
./writetest | pv -ptebaSs 800G>/dev/null
./writetest | pv -ptebaSs 800G >/dev/null
我获得大约7.5 GB/s的速度.
I get a speed of about 7.5 GB/s.
顺便说一句,这是我得到的相同速度:
Incidentally, that is the same speed I get if I do:
$ mkfifo out
$ dd if=/dev/zero bs=16384 >out &
$ dd if=/dev/zero bs=16384 >out &
$ dd if=/dev/zero bs=16384 >out &
$ dd if=/dev/zero bs=16384 >out &
pv <out -ptebaSs 800G >/dev/null
有什么方法可以使速度更快? 笔记. 实际程序中的缓冲区未填充零.
Is there any way to make this faster? Note. the buffer in the real program is not filled with zeroes.
我的好奇心是了解单个程序(经过重复处理或多进程)可以输出多少数据
好像有4个人不明白这个简单的问题. 我什至大胆地提出了问题的原因.
It looks like 4 people didn't understand this simple question. I even put in bold the reason of the question.
推荐答案
看来,Linux调度程序和IO优先级在减速中起了很大的作用.
Well it seems that linux scheduler and IO priorities played had a big role in the slowdown.
此外,幽灵和其他CPU漏洞缓解措施也开始发挥作用.
Also, spectre and other cpu vunerability mitigations came to play.
进一步优化后,要获得更快的速度,我必须调整以下内容:
After further optimization, to achieve a faster speed I had to tune this things:
1) program nice level (nice -n -20)
2) program ionice level (ionice -c 1 -n 7)
3) pipe size increased 8 times.
4) disable cpu mitigations by adding "pti=off spectre_v2=off l1tf=off" in kernel command line
5) tuning the linux scheduler
echo -n -1 >/proc/sys/kernel/sched_rt_runtime_us
echo -n -1 >/proc/sys/kernel/sched_rt_period_us
echo -n -1 >/proc/sys/kernel/sched_rr_timeslice_ms
echo -n 0 >/proc/sys/kernel/sched_tunable_scaling
现在程序输出(在同一台计算机上)为8.00 GB/秒!
Now the program outputs (on the same pc) 8.00 GB/sec!
如果您有其他想法,欢迎您提供帮助.
If you have other ideas you're welcome to contribute.
这篇关于如何尽快输出固定缓冲区?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!