使用 TF_SessionRun 在 C(不是 C++)中运行 TensorFlow 图时出现分段错误 [英] Segmentation fault when using TF_SessionRun to run TensorFlow graph in C (not C++)

查看:46
本文介绍了使用 TF_SessionRun 在 C(不是 C++)中运行 TensorFlow 图时出现分段错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 C API 加载和运行 TensorFlow 图(我需要在 TensorFlow 项目之外构建,最好没有 Bazel,因此不能使用 C++).

I'm trying to load and run a TensorFlow graph using the C API (I need to build outside of the TensorFlow project, and preferably without Bazel, so can't use C++).

该图是一个 3 层 LSTM-RNN,它将 3 个元素的特征向量分类为 9 个类别之一.该图是用 Python 构建和训练的,我已经在 Python 和 C++ 中对其进行了测试.

The graph is a 3-layer LSTM-RNN which classifies feature vectors of 3 elements into one of 9 classes. The graph is built and trained in Python, and I've tested it in both Python and C++.

到目前为止,我已经加载了图形,但是一旦加载图形,我就无法运行会话.我已经做了一些挖掘,但我只找到了一个使用 C API 的例子(此处),这不包括运行图形.

So far, I've got the graph loading, however I'm having trouble running the session once the graph is loaded. I've done a fair bit of digging around, but I've only found one example using the C API (here), and that doesn't include running the graph.

我已经设法将以下内容放在一起,但它会产生分段错误(如果我注释掉 TF_SessionRun() 调用,我可以成功运行代码,但是当包含 TF_SessionRun() 时,我会收到 seg 错误).代码如下:

I've managed to put together the following, but it produces a segmentation fault (I can successfully run the code if I comment out the TF_SessionRun() call, but I get the seg fault when TF_SessionRun() is included). Here's the code:

#include "tensorflow/c/c_api.h"
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#include <string.h>
#include <assert.h>
#include <vector>
#include <algorithm>
#include <iterator>


TF_Buffer* read_file(const char* file);

void free_buffer(void* data, size_t length) {
        free(data);
}

static void Deallocator(void* data, size_t length, void* arg) {
        free(data);
}

int main() {
  // Use read_file to get graph_def as TF_Buffer*
  TF_Buffer* graph_def = read_file("tensorflow_model/constant_graph_weights.pb");
  TF_Graph* graph = TF_NewGraph();

  // Import graph_def into graph
  TF_Status* status = TF_NewStatus();
  TF_ImportGraphDefOptions* graph_opts = TF_NewImportGraphDefOptions();
  TF_GraphImportGraphDef(graph, graph_def, graph_opts, status);
  if (TF_GetCode(status) != TF_OK) {
          fprintf(stderr, "ERROR: Unable to import graph %s", TF_Message(status));
          return 1;
  }
  else {
          fprintf(stdout, "Successfully imported graph\n");
  }

  // Configure input & provide dummy values
  const int num_bytes = 3 * sizeof(float);
  const int num_bytes_out = 9 * sizeof(int);
  int64_t dims[] = {3};
  int64_t out_dims[] = {9};

  float values[3] = {-1.04585315e+03,   1.25702492e+02,   1.11165466e+02};


  // Setup graph inputs
  std::vector<TF_Tensor*> input_values;
  TF_Operation* input_op = TF_GraphOperationByName(graph, "lstm_1_input");
  TF_Output inputs = {input_op, 0};
  TF_Tensor* input = TF_NewTensor(TF_FLOAT, dims, 1, &values, num_bytes, &Deallocator, 0);
  input_values.push_back(input);

  // Setup graph outputs
  TF_Operation* output_op = TF_GraphOperationByName(graph, "output_node0");
  TF_Output outputs = {output_op, 0};
  std::vector<TF_Tensor*> output_values(9, nullptr);

  // Run graph
  fprintf(stdout, "Running session...\n");
  TF_SessionOptions* sess_opts = TF_NewSessionOptions();
  TF_Session* session = TF_NewSession(graph, sess_opts, status);
  assert(TF_GetCode(status) == TF_OK);
  TF_SessionRun(session, nullptr,
                &inputs, &input_values[0], 3,
                &outputs, &output_values[0], 9,
                nullptr, 0, nullptr, status);

  fprintf(stdout, "Successfully run session\n");

  TF_CloseSession(session, status);
  TF_DeleteSession(session, status);
  TF_DeleteSessionOptions(sess_opts);
  TF_DeleteImportGraphDefOptions(graph_opts);
  TF_DeleteGraph(graph);
  TF_DeleteStatus(status);
  return 0;
}

TF_Buffer* read_file(const char* file) {
  FILE *f = fopen(file, "rb");
  fseek(f, 0, SEEK_END);
  long fsize = ftell(f);
  fseek(f, 0, SEEK_SET);

  void* data = malloc(fsize);
  fread(data, fsize, 1, f);
  fclose(f);

  TF_Buffer* buf = TF_NewBuffer();
  buf->data = data;
  buf->length = fsize;
  buf->data_deallocator = free_buffer;
  return buf;
}

我不确定 TF_SessionRun 到底哪里出了问题,因此我们将不胜感激!

I'm not sure exactly where I'm going wrong with TF_SessionRun, so any help would be greatly appreciated!

更新:我在 gdb 中的 TF_SessionRun 调用处设置了一个断点,当我逐步执行它时,我首先得到:线程 1 收到信号 SIGSEGV,分段错误.0x0000000100097650 在 ??()其次是:找不到当前函数的边界"我最初认为这是因为 TensorFlow 库没有使用调试符号编译,但后来使用调试符号编译它并在 gdb 中获得相同的输出.

Update: I've set a break point at the TF_SessionRun call in gdb, and as I step through it, I first get: Thread 1 received signal SIGSEGV, Segmentation fault. 0x0000000100097650 in ?? () followed by: "Cannot find bounds of current function" I initially thought this was as the TensorFlow library wasn't compiled with debug symbols, but have since compiled it with debug symbols and get the same output in gdb.

自从我的原始帖子以来,我在 here 中找到了一个 TensorFlow C 示例(但是作者指出出它是未经测试的).因此,我已经根据他们的示例重写了我的代码,并使用 TensorFlow 的 c_api.h 头文件仔细检查了所有内容.我现在还从 C++ 文件调用 C API(正如上面示例中所做的那样).尽管如此,我仍然从 gdb 获得相同的输出.

Since my original post I found a TensorFlow C example here (however the author points out that it's untested). As such, I've since re-written my code according to their example, and have double checked everything with TensorFlow's c_api.h header file. I'm also now calling the C API from a C++ file (as that's what's done in the above example). Despite all this, I'm still getting the same output from gdb.

更新 2:为了确保我的图形正确加载,我使用了 C API 中的一些 TF_Operation 函数(TF_GraphNextOperation() 和 TF_OperationName())来检查图形操作,并将这些与在 Python 中加载图形时的操作进行了比较.输出看起来是正确的,我可以从操作中检索属性(例如使用 TF_OperationNumOutputs()),所以看起来图形肯定是正确加载的.

Update 2: To ensure that my graph is loading properly, I've used some of the TF_Operation functions in the C API (TF_GraphNextOperation() and TF_OperationName()) to check the graph operations, and have compared these with the operations when loading the graph in Python. The output looks correct, and I can retrieve properties from the operations (e.g. using TF_OperationNumOutputs()), so it appears the graph is definitely loading correctly.

非常感谢有使用 TensorFlow 的 C API 经验的人的建议.

Advice from someone with experience using TensorFlow's C API would be greatly appreciated.

推荐答案

经过更多时间尝试 C api 中的函数并密切关注占位符的维度后,我设法解决了这个问题.我原来的段错误是由于将错误的操作名称字符串传递给 TF_GraphOperationByName() 引起的,但是段错误只发生在 TF_SeesionRun() 因为这是它尝试的第一个地方访问该操作.对于遇到同样问题的任何人,以下是我解决该问题的方法:

I managed to resolve the issue after more time trying out functions in the C api and paying close attention to the dimensionality of my placeholders. My original seg fault was caused by passing the wrong operation name string to TF_GraphOperationByName(), however the seg fault only occurred at TF_SeesionRun() as this was the first place it tried to access that operation. Here's how I resolved the issue, for anyone facing the same problem:

首先,检查您的操作以确保它们被正确分配.就我而言,由于在 Python 中获取操作名称时出错,我提供给 input_op 的操作名称不正确.我从 Python 得到的错误操作名称是lstm_4_input".通过使用 C API 在加载的图形上运行以下命令,我发现这是不正确的:

Firstly, check your operations to ensure that they're assigned correctly. in my case, the operation name I provided to input_op was incorrect due to an error when obtaining the operation names in Python. The incorrect op name I got from Python was 'lstm_4_input'. I found this was incorrect by running the following on the loaded graph with the C API:

  n_ops = 700
  for (int i=0; i<n_ops; i++)
  {
    size_t pos = i;
    std::cout << "Input: " << TF_OperationName(TF_GraphNextOperation(graph, &pos)) << "\n";
  }

其中 n_ops 是图形中的操作数.这将打印出您的操作名称;在这种情况下,我可以看到没有lstm_4_input",但有一个lstm_1_input",因此我相应地更改了值.此外,它还验证了我的输出操作output_node0"是正确的.

Where n_ops is the number of operations in your graph. This will print out your operation names; in this case I could see there was no 'lstm_4_input', but there was an 'lstm_1_input', so I changed the value accordingly. Furthermore, it validated that my output operation, 'output_node0', was correct.

在我解决了 seg 错误后,还有一些其他问题变得清晰起来,所以这里是完整的工作代码,带有详细的注释,供任何面临类似问题的人使用:

There were a few other issues that became clear once I resolved the seg fault, so here's the complete working code, with detailed comments, for anyone facing similar problems:

#include "tensorflow/c/c_api.h"

#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#include <string.h>
#include <assert.h>
#include <vector>
#include <algorithm>
#include <iterator>
#include <iostream>


TF_Buffer* read_file(const char* file);

void free_buffer(void* data, size_t length) {
        free(data);
}

static void Deallocator(void* data, size_t length, void* arg) {
        free(data);
        // *reinterpret_cast<bool*>(arg) = true;
}

int main() {
  // Use read_file to get graph_def as TF_Buffer*
  TF_Buffer* graph_def = read_file("tensorflow_model/constant_graph_weights.pb");
  TF_Graph* graph = TF_NewGraph();

  // Import graph_def into graph
  TF_Status* status = TF_NewStatus();
  TF_ImportGraphDefOptions* graph_opts = TF_NewImportGraphDefOptions();
  TF_GraphImportGraphDef(graph, graph_def, graph_opts, status);
  if (TF_GetCode(status) != TF_OK) {
          fprintf(stderr, "ERROR: Unable to import graph %s", TF_Message(status));
          return 1;
  }
  else {
          fprintf(stdout, "Successfully imported graph\n");
  }

  // Create variables to store the size of the input and output variables
  const int num_bytes_in = 3 * sizeof(float);
  const int num_bytes_out = 9 * sizeof(float);

  // Set input dimensions - this should match the dimensionality of the input in
  // the loaded graph, in this case it's three dimensional.
  int64_t in_dims[] = {1, 1, 3};
  int64_t out_dims[] = {1, 9};

  // ######################
  // Set up graph inputs
  // ######################

  // Create a variable containing your values, in this case the input is a
  // 3-dimensional float
  float values[3] = {-1.04585315e+03,   1.25702492e+02,   1.11165466e+02};

  // Create vectors to store graph input operations and input tensors
  std::vector<TF_Output> inputs;
  std::vector<TF_Tensor*> input_values;

  // Pass the graph and a string name of your input operation
  // (make sure the operation name is correct)
  TF_Operation* input_op = TF_GraphOperationByName(graph, "lstm_1_input");
  TF_Output input_opout = {input_op, 0};
  inputs.push_back(input_opout);

  // Create the input tensor using the dimension (in_dims) and size (num_bytes_in)
  // variables created earlier
  TF_Tensor* input = TF_NewTensor(TF_FLOAT, in_dims, 3, values, num_bytes_in, &Deallocator, 0);
  input_values.push_back(input);

  // Optionally, you can check that your input_op and input tensors are correct
  // by using some of the functions provided by the C API.
  std::cout << "Input op info: " << TF_OperationNumOutputs(input_op) << "\n";
  std::cout << "Input data info: " << TF_Dim(input, 0) << "\n";

  // ######################
  // Set up graph outputs (similar to setting up graph inputs)
  // ######################

  // Create vector to store graph output operations
  std::vector<TF_Output> outputs;
  TF_Operation* output_op = TF_GraphOperationByName(graph, "output_node0");
  TF_Output output_opout = {output_op, 0};
  outputs.push_back(output_opout);

  // Create TF_Tensor* vector
  std::vector<TF_Tensor*> output_values(outputs.size(), nullptr);

  // Similar to creating the input tensor, however here we don't yet have the
  // output values, so we use TF_AllocateTensor()
  TF_Tensor* output_value = TF_AllocateTensor(TF_FLOAT, out_dims, 2, num_bytes_out);
  output_values.push_back(output_value);

  // As with inputs, check the values for the output operation and output tensor
  std::cout << "Output: " << TF_OperationName(output_op) << "\n";
  std::cout << "Output info: " << TF_Dim(output_value, 0) << "\n";

  // ######################
  // Run graph
  // ######################
  fprintf(stdout, "Running session...\n");
  TF_SessionOptions* sess_opts = TF_NewSessionOptions();
  TF_Session* session = TF_NewSession(graph, sess_opts, status);
  assert(TF_GetCode(status) == TF_OK);

  // Call TF_SessionRun
  TF_SessionRun(session, nullptr,
                &inputs[0], &input_values[0], inputs.size(),
                &outputs[0], &output_values[0], outputs.size(),
                nullptr, 0, nullptr, status);

  // Assign the values from the output tensor to a variable and iterate over them
  float* out_vals = static_cast<float*>(TF_TensorData(output_values[0]));
  for (int i = 0; i < 9; ++i)
  {
      std::cout << "Output values info: " << *out_vals++ << "\n";
  }

  fprintf(stdout, "Successfully run session\n");

  // Delete variables
  TF_CloseSession(session, status);
  TF_DeleteSession(session, status);
  TF_DeleteSessionOptions(sess_opts);
  TF_DeleteImportGraphDefOptions(graph_opts);
  TF_DeleteGraph(graph);
  TF_DeleteStatus(status);
  return 0;
}

TF_Buffer* read_file(const char* file) {
  FILE *f = fopen(file, "rb");
  fseek(f, 0, SEEK_END);
  long fsize = ftell(f);
  fseek(f, 0, SEEK_SET);  //same as rewind(f);

  void* data = malloc(fsize);
  fread(data, fsize, 1, f);
  fclose(f);

  TF_Buffer* buf = TF_NewBuffer();
  buf->data = data;
  buf->length = fsize;
  buf->data_deallocator = free_buffer;
  return buf;
}

注意: 在我之前的尝试中,我使用3"和9"作为 ninputsnoutputs 参数TF_SessionRun(),认为这些与我的输入和输出张量的长度有关(我将 3 维特征分为 9 个类别之一).事实上,这些只是输入/输出张量的数量,因为张量的维度在实例化时会更早地处理.在这里只使用 .size() 成员函数很容易(当使用 std::vectors 来保存 TF_Outputs 时).

Note: in my earlier attempt, I used '3' and '9' as the ninputs and noutputs arguments for TF_SessionRun(), thinking that these related to the length of my input and output tensors (I'm classifying 3-dimensional features into one of 9 classes). In fact, these are simple the number of input/output tensors, as the dimensionality of the tensors is handled earlier when they're instantiated. It's easy to just use the .size() member function here (when using std::vectors to hold the TF_Outputs).

希望这是有道理的,并有助于为将来发现自己处于类似位置的任何人澄清流程!

Hopefully this makes sense and helps to clarify the process for anyone who finds themselves in a similar position in future!

这篇关于使用 TF_SessionRun 在 C(不是 C++)中运行 TensorFlow 图时出现分段错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆