多线程 - 如何尽可能多地使用 CPU? [英] Multithreading - How to use CPU as much as possible?

查看：54 发布时间：2021/6/21 20:18:17 python c++ multithreading tensorflow profiling

本文介绍了多线程 - 如何尽可能多地使用 CPU?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在使用 C++ 实现 Tensorflow 自定义操作(用于自定义数据获取器)，以加快我的 Tensorflow 模型.由于我的 Tensorflow 模型并没有大量使用 GPU，我相信我可以同时使用多个工作线程来实现最大性能.

I'm currently implementing Tensorflow custom op(for custom data fetcher) using C++ in order to speed up my Tensorflow model. Since my Tensorflow model doesn't use GPU a lot, I believe I can achieve maximal performance using multiple worker threads concurrently.

问题是，即使我有足够的工人，我的程序也没有利用所有的 CPU.在我的开发机器上，(4 个物理内核)它使用了大约 90% 的用户时间，4% 的系统时间，4 个工作线程和 tf.ConfigProto(inter_op_parallelism_threads=6) 选项.

The problem is, even though I have enough workers, my program doesn't utilize all CPU. In my development machine, (4 physical core) it uses about 90% of user time, 4% of sys time with 4 worker threads and tf.ConfigProto(inter_op_parallelism_threads=6)options.

有了更多的工作线程和 inter_op_parallelism_threads 选项，我的模型运行性能比以前的配置差得多.由于不擅长prpfiling，不知道代码的瓶颈在哪里.

With more worker threads and inter_op_parallelism_threads options, I get much worse model running performance than previous configuration. Since I don't good at prpfiling I don't know where is the bottleneck of my code.

是否有任何经验法则可以最大限度地提高 CPU 使用率和/或找到 Linux 中单个进程(非系统范围)的性能瓶颈/互斥锁的好工具?

Is there any rule of thumbs to maximize CPU usage and/or good tools to find performance bottleneck/mutex lock for single process(not system-wide) in Linux?

我的代码运行 python，但(几乎)每次执行都在 C++ 代码中.其中一些不是我的(Tensorflow 和 Eigen)，我制作了一个可以在 Python 中动态加载的共享库，并且它正在被 Tensorflow 内核调用.Tensorflow拥有他们的线程池，我的动态库代码也拥有线程池，我的代码是线程安全.我还创建线程来同时调用 sess.run() 以调用它们.就像 Python 可以同时调用多个 HTTP 请求一样，sess.run() 发布了 GIL.我的对象是尽可能多地调用 sess.run() 以提高真实"性能，并且任何与 python 相关的分析器都不成功.

My code runs python, but (almost) every executions are in C++ code. Some of them are not mine(Tensorflow and and Eigen), and I've made a shared library that can be dynamically loaded in Python and it is being called by Tensorflow kernel. Tensorflow owns their thread pool and my dynamic library code also owns thread pool, and my code is thread safe. I also create threads to call sess.run() concurrently in order to call them. Like Python can call multiple HTTP requests concurrently, sess.run() release GIL. My object is call sess.run() as much as possible to increase "real" performance, and any python-related profiler wasn't succesful.

多线程 - 如何尽可能多地使用 CPU? [英] Multithreading - How to use CPU as much as possible?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

多线程 - 如何尽可能多地使用 CPU? [英] Multithreading - How to use CPU as much as possible?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭