用户空间与内核空间程序性能差异 [英] User space Vs Kernel space program performance difference

查看:215
本文介绍了用户空间与内核空间程序性能差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个顺序的用户空间程序(某种内存密集型搜索数据结构).以CPU周期数衡量的程序性能取决于基础数据结构的内存布局和数据缓存大小(LLC).

I have a sequential user space program (some kind of memory intensive search data structure). The program's performance, measured as number of CPU cycles, depends on memory layout of the underlying data structures and data cache size (LLC).

到目前为止,我的用户空间程序已经微不足道,现在我想知道是否可以通过将用户空间代码移入内核(作为内核模块)来获得性能提升.我可以想到以下因素可以改善内核空间的性能...

So far my user space program is tuned to death, now I am wondering if I can get performance gain by moving the user space code into kernel (as a kernel module). I can think of the following factors that improve the performance in kernel space ...

  1. 没有系统调用开销(每个系统调用获得多少CPU周期).这一点不太重要,因为除了在程序启动时才分配内存外,我几乎没有在程序中使用任何系统调用.
  2. 控制调度,我可以创建一个内核线程并使它在给定的内核上运行而不会被丢弃.
  3. 我可以使用kmalloc内存分配,因此可以更好地控制分配的内存,还可以通过控制分配的内存来更精确地控制缓存的着色.值得尝试吗?
  1. No system call overhead (how many CPU cycles is gained per system call). This is less critical as I am barely using any system call in my program except for allocating memory that too just when the program starts.
  2. Control over scheduling, I can create a kernel thread and make it run on a given core without being thrown away.
  3. I can use kmalloc memory allocation and thus can have more control over memory allocated, may can also control the cache coloring more precisely by controlling allocated memory. Is it worth trying?

我对内核专家的疑问...

My questions to the kernel experts...

  • 我是否错过了上面列出的可以进一步改善性能的因素?
  • 是值得尝试还是直接知道我不会获得很多性能改进?
  • 如果可以在内核中实现性能提升,那么是否有任何估算值(理论上的猜测)?

谢谢.

推荐答案

关于点1 :内核线程仍然可以被抢占,因此,除非您进行大量的syscall(不是),这不会给您带来多少好处.

Regarding point 1: kernel threads can still be preempted, so unless you're making lots of syscalls (which you aren't) this won't buy you much.

关于第2点:您可以使用 sched_setaffinity() (在Linux上).

Regarding point 2: you can pin a thread to a specific core by setting its affinity, using sched_setaffinity() on Linux.

关于第3点:您还希望获得什么额外的控制权?您已经可以使用mmap()从用户空间分配页面对齐的内存.这已经可以控制缓存的设置关联性,并且可以将内联汇编或编译器内部函数用于任何手动预取提示或非临时写入.在内核和用户空间中分配的内存之间的主要区别是kmalloc()分配有线(不可分页)的内存.我不知道这有什么帮助.

Regarding point 3: What extra control are you expecting? You can already allocate page-aligned memory from user space using mmap(). This already lets you control for the cache's set associativity, and you can use inline assembly or compiler intrinsics for any manual prefetching hints or non-temporal writes. The main difference between memory allocated in the kernel and in user space is that kmalloc() allocates wired (non-pageable) memory. I don't see how this would help.

我怀疑您在使用SIMD,多线程或进行进一步的算法或内存优化进行并行化时会看到更高的投资回报率.

I suspect you'll see much better ROI on parallelising using SIMD, multithreading or making further algorithmic or memory optimisations.

这篇关于用户空间与内核空间程序性能差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆