C ++内联作为多线程优化tecnique? [英] C++ inlining as a multithreading optimization tecnique?

查看:77
本文介绍了C ++内联作为多线程优化tecnique?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题是在重型多线程环境(200多个线程)中可能使用内联来提高性能。

如果我们必须使用应用程序哪些线程不是I / O

有界或用户时间限制(即基于Windows的应用程序)但是

它们同时参与执行相同的

并行化任务(即矩阵 - 矩阵乘法),我们必须确保并行化所获得的优势不小于

引入的开销通过上下文切换需要暂停/唤醒

up / run threads。

我的意见是我们可以使用内联作为改善上下文的机制

切换性能。

内联函数不需要使用自己的参数将其调用

保存到堆栈中。它只是扩展到代码中,并带有

它携带的任何参数的临时副本。恢复一个线程

执行一个点,当一个内联的

函数被调用只是意味着在那个

代码行恢复程序计数器而不是弹出从堆栈中,当前函数使用自己的参数调用

。在我看来,这是一个很大的内存空间和

时间。

任何人对此有任何意见或已经有过这样的解决方案

结果?


Gianguglielmo

The question is about the possible use of inlining to improve
performance in a heavy multithreading environment (200+ threads).
If we have to work with applications in which threads aren''t I/O
bounded or user-time bounded (i.e. windows based applications) but
they are concurrently involved in the execution of the same
parallelized task (i.e. a matrix-matrix multiplication), we must ensure
that the advantage obtained by the parallelization is not lesser than
the overhead introduced by the context switching needed to suspend/wake
up/run threads.
My opinion is that we could use inline as a mechanism to improve contex
switching performance.
An inlined function doesn''t need to save into the stack its calling
with its own parameter. It is simply expanded into the code with a
temporary copy of any parameter it carries. Restoring a thread
execution in a point when an inlined
function was called simply means restoring the program counter at that
code line instead of popping from the stack the current function called
with its own parameter. It seems to me a great save of memory space and
time.
Anyone has comments about that or has already experience such solution
with some result?

Gianguglielmo

推荐答案

" gianguz" < GI ***************** @ noze.it>写在

新闻:11 ********************** @ c13g2000cwb.googlegr psps.com:
"gianguz" <gi*****************@noze.it> wrote in
news:11**********************@c13g2000cwb.googlegr oups.com:
问题是在重型多线程环境(200多个线程)中可能使用内联来提高性能。
如果我们必须使用线程不在线的应用程序I / O有界或用户时间限制(即基于Windows的应用程序)但是它们同时参与执行相同的并行化任务(即矩阵 - 矩阵乘法),我们必须确保并行化所获得的优势不会低于暂停/唤醒/运行线程所需的上下文切换所带来的开销。
我的意见是我们可以使用内联作为一种机制来改善上下文切换性能。
一个内联函数不需要用它自己的参数保存到堆栈中调用
。它只是扩展到代码中,并带有它携带的任何参数的临时副本。在调用内联
函数时恢复线程执行只是意味着在该代码行恢复程序计数器而不是从堆栈弹出当前函数调用
它自己的参数。在我看来,这是一个很好的存储空间和时间保存。
任何人对此有任何意见或者已经有过这样的解决方案吗?
The question is about the possible use of inlining to improve
performance in a heavy multithreading environment (200+ threads).
If we have to work with applications in which threads aren''t I/O
bounded or user-time bounded (i.e. windows based applications) but
they are concurrently involved in the execution of the same
parallelized task (i.e. a matrix-matrix multiplication), we must ensure
that the advantage obtained by the parallelization is not lesser than
the overhead introduced by the context switching needed to suspend/wake
up/run threads.
My opinion is that we could use inline as a mechanism to improve contex
switching performance.
An inlined function doesn''t need to save into the stack its calling
with its own parameter. It is simply expanded into the code with a
temporary copy of any parameter it carries. Restoring a thread
execution in a point when an inlined
function was called simply means restoring the program counter at that
code line instead of popping from the stack the current function called
with its own parameter. It seems to me a great save of memory space and
time.
Anyone has comments about that or has already experience such solution
with some result?




我们能给出的唯一答案是个人资料,个人资料,个人资料。 _You_需要

来衡量两种机制,以确定哪种机制会更快。内联可能

删除函数调用开销,但可能导致函数的大小

气球到你在函数调用中保存的那个点,你是

现在支付额外的页面交换,或缓存未命中等等。



The only answer that we can give is profile, profile, profile. _You_ need
to measure both mechanisms to determine which will be faster. Inlining may
remove the function call overhead, but may cause the size of your functions
to balloon to such a point that what you saved in function calls, you''re
now paying in additional page swaps, or cache misses, or whatever.


你是正确的分析。这是进行正式的b $ b b分析的唯一方法,但我也试图找出一个健壮的方面。假设
你在同一个应用程序中连续运行了1000个线程(例如,对于像多个代理框架这样的系统来说,b $ b,其中每个代理都是通过实现
实现的一个线程对象,这可以是一个正常的值)和

他们正在调用预定义的函数序列。一个简单的调用

序列如:f(g(h(k(x,y,z))))将产生大量的临时数和函数执行点
保存在堆栈上。

即使性能增益可能更好/相等/最差内联,

保存的内存量似乎很明显。此外,

可能达到运行时环境数据的限制

结构小于非内联案例。


Gianguglielmo

You are right about profiling. It is the only way to have a formal
analysis but I''m trying to figure out also a robustness aspect. Suppose
you have 1000 thread continuosly running in the same application (i.e.
for system like multi agent framework in which each Agent is
implemented throught a thread object this can be a normal value) and
they are calling a predifined sequence of functions. A simple calling
sequence like : f ( g ( h ( k( x,y,z) ) ) ) will produce a great amount
of temporaries and functions execution points to be saved on the stack.
Even if performance gain could be better / equal / worst with inlining,
the amount of memory saved seems to be evident. Moreover the
possibility of reaching the limits of the run-time environment data
structures is lesser than the non inlined case.

Gianguglielmo


2004年12月13日02:06:24 -0800,gianguz < gi ***************** @noze.it>

写道:
On 13 Dec 2004 02:06:24 -0800, "gianguz" <gi*****************@noze.it>
wrote:
问题是关于在重型多线程环境(200多个线程)中使用内联来提高性能的可能性。
如果我们必须处理线程不是I / O
有限的应用程序或用户时间有界(即基于Windows的应用程序)但是它们同时参与执行相同的并行化任务(即矩阵 - 矩阵乘法),我们必须确保通过并行化获得的优势并不比暂停/唤醒/运行线程所需的上下文切换引入的开销小。
我的意见是我们可以使用内联作为一种机制来实现改进上下文切换性能。
内联函数不需要使用自己的参数将其调用保存到堆栈中。它只是扩展到代码中,并带有它携带的任何参数的临时副本。在调用内联
函数时恢复线程执行只是意味着在该代码行恢复程序计数器而不是从堆栈弹出当前函数调用
它自己的参数。在我看来,这是一个很大的存储空间和时间节省。


我不确定这个。如果您的上下文切换到内联的

函数调用,就像上下文切换到调用上下文
内联调用的
- 您仍然切换到一个函数(在

汇编术语),它只是呼叫者,而不是内联版。

任何人对此有任何意见或已经有过这样的解决方案
结果?
The question is about the possible use of inlining to improve
performance in a heavy multithreading environment (200+ threads).
If we have to work with applications in which threads aren''t I/O
bounded or user-time bounded (i.e. windows based applications) but
they are concurrently involved in the execution of the same
parallelized task (i.e. a matrix-matrix multiplication), we must ensure
that the advantage obtained by the parallelization is not lesser than
the overhead introduced by the context switching needed to suspend/wake
up/run threads.
My opinion is that we could use inline as a mechanism to improve contex
switching performance.
An inlined function doesn''t need to save into the stack its calling
with its own parameter. It is simply expanded into the code with a
temporary copy of any parameter it carries. Restoring a thread
execution in a point when an inlined
function was called simply means restoring the program counter at that
code line instead of popping from the stack the current function called
with its own parameter. It seems to me a great save of memory space and
time.
I''m not sure about this. If you context switch into an inlined
function call, it is like context switching into the calling context
of the inline call - you are still switching into a function (in
assembler terms), it''s just the calling one, not the inlined one.
Anyone has comments about that or has already experience such solution
with some result?




好​​吧,内联可能会或可能不会减少

a线程所需的堆栈大小,它可能会也可能不会减少代码的大小(

保存未设置函数调用可以抵消可能的

相同代码的多个副本)。

但是,我不确定内联对多线程代码有什么影响

它对单线程代码没有影响。上下文之间切换

线程(以及某些架构上的进程,如QNX)只涉及

据我所知还原一堆寄存器 - 你必须这样做

是否转换为内联函数。


Tom



Well, inlining may or may not reduce the size of the stack required by
a thread, and it may or may not reduce the size of the code (the
saving in not setting up a function call is offset against possible
multiple copies of the same code).

However, I''m not sure inlining has any influence on multithreaded code
that it doesn''t have on single threaded code. Context switches between
threads (and processes on some architectures like QNX) just involve
restoring a load of registers as far as I know - you have to do this
whether switching into an inline function or not.

Tom


这篇关于C ++内联作为多线程优化tecnique?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆