使用 C API 更改 Tensorflow 推理的线程数 [英] Change number of threads for Tensorflow inference with C API

查看:44
本文介绍了使用 C API 更改 Tensorflow 推理的线程数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在围绕 tensorflow 1.2 C API 编写一个 C++ 包装器(用于推理,如果重要的话).由于我的应用程序是一个多进程和多线程的应用程序,资源是明确分配的,我想限制 Tensorflow 只使用一个线程.

I'm writing a c++ wrapper around tensorflow 1.2 C API (for inference purposes, if it matters). Since my application is a multi-process and multi-threaded one, where resources are explicitly allocated, I would like to limit Tensorflow to only use one thread.

目前,运行允许批处理的简单推理测试,我看到它正在使用所有 CPU 内核.我尝试使用 C 和 C++ 的混合来限制新会话的线程数,如下所示(请原谅我的部分代码片段,我希望这是有道理的):

Currently, running a simple inference test that allows batch processing, I see it is using all CPU cores. I have tried limiting number of threads for a new session using a mixture of C and C++ as follows (forgive my partial code snippet, I hope this makes sense):

tensorflow::ConfigProto conf;
conf.set_intra_op_parallelism_threads(1);
conf.set_inter_op_parallelism_threads(1);
conf.add_session_inter_op_thread_pool()->set_num_threads(1);
std::string str;
conf.SerializeToString(&str);
TF_SetConfig(m_session_opts,(void *)str.c_str(),str.size(),m_status);
m_session = TF_NewSession(m_graph, m_session_opts, m_status);

但我认为这没有任何区别 - 所有内核仍然被充分利用.

But I don't see it is making any difference - all cores are still fully utilized.

我是否正确使用了 C API?

Am I using the C API correctly?

(我目前的工作是重新编译 Tensorflow,硬编码线程数为 1,这可能会起作用,但它显然不是最好的方法......)

(My current work around is to recompile Tensorflow with hard coding number of threads to be 1, which will probably work, but its obviously not the best approach...)

-- 更新--

我也尝试添加:

conf.set_use_per_session_threads(true);

没有成功.还是用了多核...

Without success. Still multiple cores are used...

我还尝试以高日志详细程度运行,并得到以下输出(仅显示我认为相关的内容):

I also tried to run with high log verbosity, and got this output (showing only what I think is relevant):

tensorflow/core/common_runtime/local_device.cc:40] Local device intraop parallelism threads: 8
tensorflow/core/common_runtime/session_factory.cc:75] SessionFactory type DIRECT_SESSION accepts target: 
tensorflow/core/common_runtime/direct_session.cc:95] Direct session inter op parallelism threads for pool 0: 1

只要我使用 TF_NewGraph() 实例化一个新图,就会出现并行线程:8"消息.我没有找到在此图分配之前指定选项的方法...

The "parallelism threads: 8" message shows up as soon as I instantiate a new graph using TF_NewGraph(). I didn't find a way to specify options prior to this graph allocation though...

推荐答案

我遇到了同样的问题,并通过在创建我的应用程序创建的第一个 TF 会话时设置线程数来解决它.如果第一个创建的会话不是使用选项对象创建的,TF 将创建工作线程作为机器上的内核数 * 2.

I had the same problem and solved it by setting the number of threads when creating the first TF session my application is creating. If the first created session is not created with a options object TF will create worker threads as the number of cores on the machine * 2.

这是我使用的 C++ 代码:

Here is the C++ code I used:

// Call when application starts
void InitThreads(int coresToUse)
{
    // initialize the number of worker threads
    tensorflow::SessionOptions options;
    tensorflow::ConfigProto & config = options.config;
    if (coresToUse > 0)
    {
        config.set_inter_op_parallelism_threads(coresToUse);
        config.set_intra_op_parallelism_threads(coresToUse);
        config.set_use_per_session_threads(false);  
    }
    // now create a session to make the change
    std::unique_ptr<tensorflow::Session> 
        session(tensorflow::NewSession(options));
    session->Close();
}

pass 1 限制inter &内部线程为 1 个.

Pass 1 to limit the number of inter & intra threads to 1 each.

重要说明:此代码在从主应用程序(谷歌示例训练器)调用时有效,但当我将其移动到专用于包装 tensorFlow 的 DLL 时停止工作.TF 1.4.1 忽略我传递的参数并启动所有线程.我想听听您的意见...

IMPORTANT NOTE: This code works when called from the main application (google sample trainer) BUT stopped working when I moved it to a DLL dedicated to wrap tensorFlow). TF 1.4.1 ignores the parameter I pass and spins up all threads. I would like to hear your comments...

这篇关于使用 C API 更改 Tensorflow 推理的线程数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆