malloc 分段错误 [英] Malloc segmentation fault

查看:40
本文介绍了malloc 分段错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是发生分段错误的一段代码(未调用 perror):

Here is the piece of code in which segmentation fault occurs (the perror is not being called):

job = malloc(sizeof(task_t));
if(job == NULL)
    perror("malloc");

更准确地说,gdb 表示 segfault 发生在 __int_malloc 调用中,这是由 malloc 进行的子例程调用.

To be more precise, gdb says that the segfault happens inside a __int_malloc call, which is a sub-routine call made by malloc.

由于 malloc 函数是与其他线程并行调用的,最初我认为这可能是问题所在.我使用的是 2.19 版的 glibc.

Since the malloc function is called in parallel with other threads, initially I thought that it could be the problem. I was using version 2.19 of glibc.

数据结构:

typedef struct rv_thread thread_wrapper_t;

typedef struct future
{
  pthread_cond_t wait;
  pthread_mutex_t mutex;
  long completed;
} future_t;

typedef struct task
{
  future_t * f;
  void * data;
  void *
  (*fun)(thread_wrapper_t *, void *);
} task_t;

typedef struct
{
  queue_t * queue;
} pool_worker_t;

typedef struct
{
  task_t * t;
} sfuture_t;

struct rv_thread
{
  pool_worker_t * pool;
};

现在未来的实现:

future_t *
create_future()
{
  future_t * new_f = malloc(sizeof(future_t));
  if(new_f == NULL)
    perror("malloc");
  new_f->completed = 0;
  pthread_mutex_init(&(new_f->mutex), NULL);
  pthread_cond_init(&(new_f->wait), NULL);
  return new_f;
}

int
wait_future(future_t * f)
{
  pthread_mutex_lock(&(f->mutex));
  while (!f->completed)
    {
      pthread_cond_wait(&(f->wait),&(f->mutex));
    }
  pthread_mutex_unlock(&(f->mutex));
  return 0;
}

void
complete(future_t * f)
{
  pthread_mutex_lock(&(f->mutex));
  f->completed = 1;
  pthread_mutex_unlock(&(f->mutex));
  pthread_cond_broadcast(&(f->wait));
}

线程池本身:

pool_worker_t *
create_work_pool(int threads)
{
  pool_worker_t * new_p = malloc(sizeof(pool_worker_t));
  if(new_p == NULL)
    perror("malloc");
  threads = 1;
  new_p->queue = create_queue();
  int i;
  for (i = 0; i < threads; i++){
    thread_wrapper_t * w = malloc(sizeof(thread_wrapper_t));
    if(w == NULL)
      perror("malloc");
    w->pool = new_p;
    pthread_t n;
    pthread_create(&n, NULL, work, w);
  }
  return new_p;
}

task_t *
try_get_new_task(thread_wrapper_t * thr)
{
  task_t * t = NULL;
  try_dequeue(thr->pool->queue, t);
  return t;
}

void
submit_job(pool_worker_t * p, task_t * t)
{
  enqueue(p->queue, t);
}

void *
work(void * data)
{
  thread_wrapper_t * thr = (thread_wrapper_t *) data;
  while (1){
    task_t * t = NULL;
    while ((t = (task_t *) try_get_new_task(thr)) == NULL);
    future_t * f = t->f;
    (*(t->fun))(thr,t->data);
    complete(f);
  }
  pthread_exit(NULL);
}

最后是task.c:

pool_worker_t *
create_tpool()
{
  return (create_work_pool(8));
}

sfuture_t *
async(pool_worker_t * p, thread_wrapper_t * thr, void *
(*fun)(thread_wrapper_t *, void *), void * data)
{
  task_t * job = NULL;
  job = malloc(sizeof(task_t));
  if(job == NULL)
    perror("malloc");
  job->data = data;
  job->fun = fun;
  job->f = create_future();
  submit_job(p, job);
  sfuture_t * new_t = malloc(sizeof(sfuture_t));
  if(new_t == NULL)
    perror("malloc");
  new_t->t = job;
  return (new_t);
}

void
mywait(thread_wrapper_t * thr, sfuture_t * sf)
{
  if (sf == NULL)
    return;
  if (thr != NULL)
    {
      while (!sf->t->f->completed)
        {
          task_t * t_n = try_get_new_task(thr);
          if (t_n != NULL)
            {
          future_t * f = t_n->f;
          (*(t_n->fun))(thr,t_n->data);
          complete(f);
            }
        }
      return;
    }
  wait_future(sf->t->f);
  return ;
}

队列是lfds无锁队列.

The queue is the lfds lock-free queue.

#define enqueue(q,t) {                                 
    if(!lfds611_queue_enqueue(q->lq, t))             
      {                                               
        lfds611_queue_guaranteed_enqueue(q->lq, t);  
      }                                               
  }

#define try_dequeue(q,t) {                            
    lfds611_queue_dequeue(q->lq, &t);               
  }

只要对 async 的调用次数非常多,就会出现问题.

The problem happens whenever the number of calls to async is very high.

Valgrind 输出:

Valgrind output:

Process terminating with default action of signal 11 (SIGSEGV)
==12022==  Bad permissions for mapped region at address 0x5AF9FF8
==12022==    at 0x4C28737: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

推荐答案

我已经找出问题所在:堆栈溢出.

I've figured out what the problem is: a stack overflow.

首先,让我解释一下为什么会在 malloc 内部发生堆栈溢出(这可能就是您阅读本文的原因).当我的程序运行时,每次它开始执行(递归)另一个任务时堆栈大小都会不断增加(因为我的编程方式).但是每次这样,我都必须使用 malloc 分配一个新任务.但是,malloc 会进行其他子例程调用,这使得堆栈的大小甚至比执行另一个任务的简单调用还要大.所以,发生的事情是,即使没有 malloc,我也会遇到堆栈溢出.然而,因为我有 malloc,堆栈溢出的那一刻是在 malloc 中,在它通过另一个递归调用溢出之前.下图显示了正在发生的事情:

First, let me explain why the stack overflow occurs inside malloc (which is probably why you are reading this). When my program was run, the stack size kept increasing each time it started executing (recursively) another task (because of the way I had programmed it). But for each such time, I had to allocate a new task using malloc. However, malloc makes other sub-routine calls, which make the stack increase its size even more than a simple call to execute another task. So, what was happening was that, even if there was no malloc, I would get a stack overflow. However, because I had malloc, the moment the stack overflowed was in malloc, before it overflowed by making another recursive call. The illustration bellow shows what was happening:

初始堆栈状态:

-------------------------
| recursive call n - 3  |
-------------------------
| recursive call n - 2  |
-------------------------
| recursive call n - 1  |
-------------------------
|        garbage        |
-------------------------
|        garbage        | <- If the stack passes this point, the stack overflows.
-------------------------

malloc 调用期间的堆栈:

stack during malloc call:

-------------------------
| recursive call n - 3  |
-------------------------
| recursive call n - 2  |
-------------------------
| recursive call n - 1  |
-------------------------
|        malloc         |
-------------------------
|     __int_malloc      | <- If the stack passes this point, the stack overflows.
-------------------------

然后堆栈又收缩了,我的代码又进入了新的递归调用:

Then the stack shrank again, and my code entered a new recursive call:

-------------------------
| recursive call n - 3  |
-------------------------
| recursive call n - 2  |
-------------------------
| recursive call n - 1  |
-------------------------
| recursive call n      |
-------------------------
|        garbage        | <- If the stack passes this point, the stack overflows.
-------------------------

然后,它在这个新的递归调用中再次调用了 malloc.但是,这一次它溢出了:

Then, it invoked malloc again inside this new recursive call. However, this time it overflowed:

-------------------------
| recursive call n - 3  |
-------------------------
| recursive call n - 2  |
-------------------------
| recursive call n - 1  |
-------------------------
| recursive call n      |
-------------------------
|        malloc         | <- If the stack passes this point, the stack overflows.
-------------------------
|     __int_malloc      | <- This is when the stack overflow occurs.
-------------------------

[其余的答案更侧重于为什么我的代码中特别出现了这个问题.]

[The rest of the answer is more focused around why I had this problem in my code in particular.]

通常,当递归计算斐波那契时,例如,对于某个数字 n,堆栈大小会随着该数字线性增长.但是,在这种情况下,我正在创建任务,使用队列来存储它们,并将 (fib) 任务出列以执行.如果你在纸上画这个,你会看到任务的数量随着 n 呈指数增长,而不是线性增长(另请注意,如果我在创建任务时使用堆栈来存储任务,分配的任务数量为以及堆栈大小只会随 n 线性增长.所以发生的情况是堆栈随 n 呈指数增长,导致堆栈溢出......现在是为什么这个溢出发生在对 malloc 的调用中的部分.所以基本上,作为我在上面解释过,堆栈溢出发生在 malloc 调用内部,因为它是堆栈最大的地方.发生的情况是堆栈几乎爆炸了,并且由于 malloc 调用其中的函数,堆栈的增长不仅仅是 mywait 的调用和谎言.

Usually, when computing Fibonacci recursively, for example, of a certain number n, the stack size grows linearly with that number. However, in this case I'm creating tasks, using a queue to store them, and dequeuing a (fib) task for execution. If you draw this on paper, you'll see that the number of tasks grows exponentially with the n, rather than linearly (also note that if I had used a stack to store the tasks as they were created, the number of tasks allocated as well as the stack size would only grow linearly with n. So what happens is that the stack grows exponentially with n, leading to a stack overflow... Now comes the part why this overflow occurs inside the call to malloc. So basically, as I explained above, the stack overflow happened inside the malloc call because it was where the stack was largest. What happened was that the stack was almost exploding, and since malloc calls functions inside it, the stack grows more than just the calling of mywait and fib.

谢谢大家!如果不是你的帮助,我将无法弄清楚!

Thank you all! If it wasn't your help i wouldn't be able to figure it out!

这篇关于malloc 分段错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆