删除调试打印后出现分段错误 [英] Segmentation fault after removing debug printing

查看:85
本文介绍了删除调试打印后出现分段错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个(对我来说)非常奇怪的细分错误.起初,我以为是由于openmp造成了我4个内核之间的干扰,但是从方程式中删除openmp不是我想要的.事实证明,当我这样做时, segfault 仍然会发生.

奇怪的是,如果我在内部事务的任何地方添加 print write ,它就会起作用.

subroutine histogrambins(rMatrix, N, L,  dr, maxBins, bins)
    implicit none;

    double precision, dimension(N,3), intent(in):: rMatrix;
    integer, intent(in) :: maxBins, N;
    double precision, intent(in) :: L, dr;

    integer, dimension(maxBins, 1), intent(out)  :: bins;

    integer :: i, j, b;  
    double precision, dimension(N,3) :: cacheParticle, cacheOther;
    double precision :: r;
    do b= 1, maxBins
        bins(b,1) = 0;
    end do 
    !$omp parallel do &
    !$omp default(none) &
    !$omp firstprivate(N, L, dr, rMatrix, maxBins) &
    !$omp private(cacheParticle, cacheOther, r, b) &
    !$omp shared(bins) 
    do i = 1,N 
        do j = 1,N
            !Check the pair distance between this one (i) and its (j) closest image 
            if (i /= j) then 
                !should be faster, because it doesn't have to look for matrix indices 
                cacheParticle(1, :) = rMatrix(i,:); 
                cacheOther(1, :) = rMatrix(j, :);  

                call inbox(cacheParticle, L);
                call inbox(cacheOther, L);  
                call closestImage(cacheParticle, cacheOther, L);    
                r = sum( (cacheParticle - cacheOther) * (cacheParticle - cacheOther) ) ** .5; 
                if (r /= r) then
                    ! r is NaN 
                     bins(maxBins,1) = bins(maxBins,1) + 1;
                else   
                     b = floor(r/dr);
                     if (b > maxBins) then
                         b = maxBins;
                     end if     

                     bins(b,1) = bins(b,1) + 1;
                end if
            end if
        end do
    end do
    !$omp end parallel do
end subroutine histogramBins 

我在f2py命令中启用了 -debug-capi :

f2py --fcompiler=gfortran --f90flags="-fopenmp -fcheck=all" -lgomp --debug-capi --debug -m -c modulename module.f90; 

哪个给我这个:

debug-capi:Fortran subroutine histogrambins(rmatrix,&n,&l,&dr,&maxbins,bins)'
At line 320 of file mol-dy.f90
Fortran runtime error: Aborted

它还会执行其他一些检查,列出给定的参数和其他调用的子例程,等等.

无论如何,调用的两个子例程都是非并行子例程.我在其他几个子例程中使用了它们,我认为最好不要用另一个子例程的并行代码来调用并行子例程.因此,在处理此功能时,没有其他功能应处于活动状态.

这是怎么回事?如何添加"print *,;"导致段错误消失?

谢谢您的时间.

解决方案

打印语句受到影响并创建或删除段错误并不常见.原因是它们会更改内存的布局方式,以便为要打印的字符串腾出空间,或者,如果要进行一些格式化,则将为临时字符串腾出空间.这种更改足以导致错误第一次显示为崩溃,或者消失.

我看到您正在从Python调用它.如果您使用的是Linux,则可以尝试遵循有关使用调试器与Python调用的Fortran的指南,并查找导致崩溃的行和数据值.此方法也适用于OpenMP.您也可以尝试将GDB用作调试器.

没有源代码解决您的问题,我认为您不太可能获得该问题的答案",但希望以上想法将帮助您自己解决问题.

(根据我的经验),使用调试器比使用print语句(几乎可以肯定的是,如果仅使用一个线程)出现这种现在看到,现在就不行"的可能性要小得多. /p>

I have a (for me) very weird segmentation error. At first, I thought it was interference between my 4 cores due to openmp, but removing openmp from the equation is not what I want. It turns out that when I do, the segfault still occurs.

What's weird is that if I add a print or write anywhere within the inner-do, it works.

subroutine histogrambins(rMatrix, N, L,  dr, maxBins, bins)
    implicit none;

    double precision, dimension(N,3), intent(in):: rMatrix;
    integer, intent(in) :: maxBins, N;
    double precision, intent(in) :: L, dr;

    integer, dimension(maxBins, 1), intent(out)  :: bins;

    integer :: i, j, b;  
    double precision, dimension(N,3) :: cacheParticle, cacheOther;
    double precision :: r;
    do b= 1, maxBins
        bins(b,1) = 0;
    end do 
    !$omp parallel do &
    !$omp default(none) &
    !$omp firstprivate(N, L, dr, rMatrix, maxBins) &
    !$omp private(cacheParticle, cacheOther, r, b) &
    !$omp shared(bins) 
    do i = 1,N 
        do j = 1,N
            !Check the pair distance between this one (i) and its (j) closest image 
            if (i /= j) then 
                !should be faster, because it doesn't have to look for matrix indices 
                cacheParticle(1, :) = rMatrix(i,:); 
                cacheOther(1, :) = rMatrix(j, :);  

                call inbox(cacheParticle, L);
                call inbox(cacheOther, L);  
                call closestImage(cacheParticle, cacheOther, L);    
                r = sum( (cacheParticle - cacheOther) * (cacheParticle - cacheOther) ) ** .5; 
                if (r /= r) then
                    ! r is NaN 
                     bins(maxBins,1) = bins(maxBins,1) + 1;
                else   
                     b = floor(r/dr);
                     if (b > maxBins) then
                         b = maxBins;
                     end if     

                     bins(b,1) = bins(b,1) + 1;
                end if
            end if
        end do
    end do
    !$omp end parallel do
end subroutine histogramBins 

I enabled -debug-capi in the f2py command:

f2py --fcompiler=gfortran --f90flags="-fopenmp -fcheck=all" -lgomp --debug-capi --debug -m -c modulename module.f90; 

Which gives me this:

debug-capi:Fortran subroutine histogrambins(rmatrix,&n,&l,&dr,&maxbins,bins)'
At line 320 of file mol-dy.f90
Fortran runtime error: Aborted

It also does a load of other checking, listing arguments given and other subroutines called and so on.

Anyway, the two subroutines called in are both non-parallel subroutines. I use them in several other subroutines and I thought it best not to call a parallel subroutine with the parallel code of another subroutine. So, at the time of processing this function, no other function should be active.

What's going on here? How can adding "print *, ;"" cause a segfault to go away?

Thank you for your time.

解决方案

It's not unusual for print statements to impact - and either create or remove the segfault. The reason is that they change the way memory is laid out to make room for the string being printed, or you will be making room for temporary strings if you're doing some formatting. That change can be sufficient to cause a bug to either appear as a crash for the first time, or to disappear.

I see you're calling this from Python. If you're using Linux - you could try following a guide to using a debugger with Fortran called from Python and find the line and the data values that cause the crash. This method also works for OpenMP. You can also try using GDB as the debugger.

Without the source code to your problem, I don't think you're likely to get an "answer" to the question - but hopefully the above ideas will help you to solve this yourself.

Using a debugger is (in my experience) considerably less likely to have this now-you-see-it-now-you-don't behaviour than with print statements (almost certainly so if only using one thread).

这篇关于删除调试打印后出现分段错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆