递归纤维蛋白原与线程,分段错误? [英] Recursive Fib with Threads, Segmentation Fault?

查看:84
本文介绍了递归纤维蛋白原与线程,分段错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何想法,为什么它工作正常,像0,1,2,3,4 ...赛格像> 15个值故障值?
    #包括
    #包括
    的#include

 无效* FIB(无效* fibToFind);主要(){
的pthread_t mainthread;长fibToFind = 15;
长finalFib;在pthread_create(安培; mainthread,NULL,FIB(无效*)fibToFind);在pthread_join(mainthread,(无效*)及finalFib);的printf(的个数为:%d个\\ N,finalFib);
}
void *的FIB(无效* fibToFind){
长RETVAL;长newFibToFind =((长)fibToFind);长returnMinusOne;
长returnMinustwo;的pthread_t minusone;
的pthread_t minustwo;如果(newFibToFind == 0 || newFibToFind == 1)
返回newFibToFind;其他{
长newFibToFind1 =((长)fibToFind) - 1;
长newFibToFind2 =((长)fibToFind) - 2;在pthread_create(安培; minusone,NULL,FIB(无效*)newFibToFind1);
在pthread_create(安培; minustwo,NULL,FIB(无效*)newFibToFind2);在pthread_join(minusone,(无效*)及returnMinusOne);
在pthread_join(minustwo,(无效*)及returnMinustwo);返回returnMinusOne + returnMinustwo;}}


解决方案

内存用完(满分为堆栈空间),或有效线程句柄?

您正在寻求一个可怕的很多线程,这需要大量的堆栈/背景的。
的Windows(和Linux)有一个愚蠢的大[连续]堆栈的想法。

从上pthreads_create的文档:
在Linux / X86-32,对于一个新的线程默认堆栈大小为2兆字节。
如果您制造万线,你需要的内存20 GB。
我建了一个版本的OP的计划,并与一些35​​00(P)线程轰炸
在Windows XP64。

请参阅本SO线程为什么大筹码是一个非常糟糕的主意更多详细信息:
<一href=\"http://stackoverflow.com/questions/3217521/why-are-stack-overflows-still-a-problem/3217680#3217680\">Why在栈溢出还是一个问题?

如果你放弃了大量筹码,并实现与堆分配并行语言
激活记录
(我们的 PARLANSE
其中之一)的问题消失。

下面是我们在PARLANSE写了第一个(顺序)程序:

 (定义fibonacci_argument 45)(定义斐波纳契
   (拉姆达(功能的天然自然的)功能
   `给定n,n次计算斐波纳契数
      (ifthenelse(小于α= 1)
           ?
         (+(斐波纳契( - ))
              (斐波纳契( - 2))
           )+
   )ifthenelse
   )拉姆达
 )定义

下面是上i7的执行运行:

  C:\\ DMS \\域\\ PARLANSE \\ TOOLS \\ PerformanceTest&gt;运行fibonaccisequential
 启动顺序斐波那契数(45)...运行时:33.752067秒
 结果:1134903170

下面是第二个,这是平行的:

 (定义coarse_grain_threshold 30);技术的不断:调跨大量的工作分摊开销叉(定义parallel_fibonacci
   (拉姆达(功能的天然自然的)功能
   `给定n,n次计算斐波纳契数
      (ifthenelse(小于?= coarse_grain_threshold)
           (斐波那契数?)
           (让(;; [n个自然] [M天然])
                   (值(||(= M(parallel_fibonacci( - )))=
                              (= N(parallel_fibonacci( - 2)))=
                          )||
                          (+ M N)
                   )值
           )让
       )ifthenelse
   )拉姆达
)定义

使得并行明确,使方案更容易写了。

并行版本我们测试通过调用(parallel_fibonacci 45)。这里
在同一I7运行执行(这可以说是有8个处理器,
但它确实是4个处理器超线程所以它确实是不太8
等效的CPU):

  C:\\ DMS \\域\\ PARLANSE \\ TOOLS \\ PerformanceTest&GT; fibonacciparallelcoarse运行
并行粗粒度斐波那契数(45)使用截止30 ...运行时:5.511126秒
结果:1134903170

6+附近的一个加速,不坏不-相当-8个处理器。一个在另一个的
这个问题的答案跑pthreads的版本;花几秒钟
(炸毁)计算纤维蛋白原(18),这5.5秒,纤维蛋白原(45)。
这告诉你的pthreads
是从根本上糟糕的方式做到细粒度并行的地段的,因为
它具有真正的,实在是高分叉的开销。 (PARLANSE被设计为
最大限度地减少分叉开销)。

下面就是,如果你的技术的不断设置上的每个的通话零(叉发生了什么
到FIB):

  C:\\ DMS \\域\\ PARLANSE \\ TOOLS \\ PerformanceTest&gt;运行fibonacciparallel
开始并行斐波那契数(45)...运行时:15.578779秒
结果:1134903170

您可以看到,摊销叉的开销是一个好主意,即使你有快速的叉子。

纤维蛋白原(45)产生颗粒的的很多的。堆分配
活动记录解决了OP的一阶问题(每十万的pthreads的
与堆栈1MB的灼伤的RAM千兆字节)。

但有一个二阶的问题:2 ^ 45 PARLANSE五谷会烧你所有的记忆太
只是保持颗粒的轨道,即使你的粮食控制块很小。
因此,它有助于有节流叉调度,一旦你有很多
(对于很多定义一些显著少腋臭2 ^ 45)谷物prevent的
从粮跟踪数据结构淹没在机器的并行性爆炸。
它具有unthrottle叉时晶粒的数量低于阈值
同样,以确保总有说不完的逻辑,并行工作的物理
的CPU做的。

Any ideas why it works fine for values like 0, 1, 2, 3, 4... and seg faults for values like >15? #include #include #include

void *fib(void *fibToFind);

main(){
pthread_t mainthread;

long fibToFind = 15;
long finalFib;

pthread_create(&mainthread,NULL,fib,(void*) fibToFind);

pthread_join(mainthread,(void*)&finalFib);

printf("The number is: %d\n",finalFib);
}


void *fib(void *fibToFind){
long retval;

long newFibToFind = ((long)fibToFind);

long returnMinusOne;
long returnMinustwo;

pthread_t minusone;
pthread_t minustwo;

if(newFibToFind == 0 || newFibToFind == 1)
return newFibToFind;

else{
long newFibToFind1 = ((long)fibToFind) - 1;
long newFibToFind2 = ((long)fibToFind) - 2;

pthread_create(&minusone,NULL,fib,(void*) newFibToFind1);
pthread_create(&minustwo,NULL,fib,(void*) newFibToFind2);

pthread_join(minusone,(void*)&returnMinusOne);
pthread_join(minustwo,(void*)&returnMinustwo);

return returnMinusOne + returnMinustwo;

}

}

解决方案

Runs out of memory (out of space for stacks), or valid thread handles?

You're asking for an awful lot of threads, which require lots of stack/context. Windows (and Linux) have a stupid "big [contiguous] stack" idea.

From the documentation on pthreads_create: "On Linux/x86-32, the default stack size for a new thread is 2 megabytes." If you manufacture 10,000 threads, you need 20 Gb of RAM. I built a version of OP's program, and it bombed with some 3500 (p)threads on Windows XP64.

See this SO thread for more details on why big stacks are a really bad idea: Why are stack overflows still a problem?

If you give up on big stacks, and implement a parallel language with heap allocation for activation records (our PARLANSE is one of these) the problem goes away.

Here's the first (sequential) program we wrote in PARLANSE:

(define fibonacci_argument 45)

(define fibonacci
   (lambda(function natural natural )function 
   `Given n, computes nth fibonacci number'
      (ifthenelse (<= ? 1)
           ?
         (+ (fibonacci (-- ?))
              (fibonacci (- ? 2))
           )+
   )ifthenelse  
   )lambda
 )define

Here's an execution run on an i7:

 C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run fibonaccisequential
 Starting Sequential Fibonacci(45)...Runtime: 33.752067 seconds
 Result: 1134903170

Here's the second, which is parallel:

(define coarse_grain_threshold 30) ; technology constant: tune to amortize fork overhead across lots of work

(define parallel_fibonacci
   (lambda (function natural natural )function 
   `Given n, computes nth fibonacci number'
      (ifthenelse (<= ? coarse_grain_threshold)
           (fibonacci ?)
           (let (;; [n natural ] [m natural ]  )
                   (value (|| (= m (parallel_fibonacci (-- ?)) )=
                              (= n (parallel_fibonacci (- ? 2)) )=
                          )||
                          (+ m n)
                   )value
           )let
       )ifthenelse  
   )lambda
)define

Making the parallelism explicit makes the programs a lot easier to write, too.

The parallel version we test by calling (parallel_fibonacci 45). Here is the execution run on the same i7 (which arguably has 8 processors, but it is really 4 processors hyperthreaded so it really isn't quite 8 equivalent CPUs):

C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run fibonacciparallelcoarse
Parallel Coarse-grain Fibonacci(45) with cutoff 30...Runtime: 5.511126 seconds
Result: 1134903170

A speedup near 6+, not bad for not-quite-8 processors. One of the other answers to this question ran the pthreads version; it took "a few seconds" (to blow up) computing Fib(18), and this is 5.5 seconds for Fib(45). This tells you pthreads is a fundamentally bad way to do lots of fine grain parallelism, because it has really, really high forking overhead. (PARLANSE is designed to minimize that forking overhead).

Here's what happens if you set the technology constant to zero (forks on every call to fib):

C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run fibonacciparallel
Starting Parallel Fibonacci(45)...Runtime: 15.578779 seconds
Result: 1134903170

You can see that amortizing fork overhead is a good idea, even if you have fast forks.

Fib(45) produces a lot of grains. Heap allocation of activation records solves the OP's first-order problem (thousands of pthreads each with 1Mb of stack burns gigabytes of RAM).

But there's a second order problem: 2^45 PARLANSE "grains" will burn all your memory too just keeping track of the grains even if your grain control block is tiny. So it helps to have a scheduler that throttles forks once you have "a lot" (for some definition of "a lot" significantly less thatn 2^45) grains to prevent the explosion of parallelism from swamping the machine with "grain" tracking data structures. It has to unthrottle forks when the number of grains falls below a threshold too, to make sure there is always lots of logical, parallel work for the physical CPUs to do.

这篇关于递归纤维蛋白原与线程,分段错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆