是armadillo solve()线程安全吗? [英] Is armadillo solve() thread safe?

查看:384
本文介绍了是armadillo solve()线程安全吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的代码中,我有循环,其中我构造和确定的线性系统,并尝试解决它:

In my code I have loop in which I construct and over determined linear system and try to solve it:

#pragma omp parallel for
for (int i = 0; i < n[0]+1; i++) {
    for (int j = 0; j < n[1]+1; j++) {
        for (int k = 0; k < n[2]+1; k++) {
            arma::mat A(max_points, 2);
            arma::mat y(max_points, 1);
            // initialize A and y

            arma::vec solution = solve(A,y);
        }
    }
}

挂起或在解向量中的结果是NaN。如果我这样做:

Sometimes, quite randomly the program hangs or the results in the solution vector are NaN. And if I put do this:

arma::vec solution;
#pragma omp critical 
{
    solution = solve(weights*A,weights*y);
}

那么这些问题似乎不再发生了。

then these problem don't seem to happen anymore.

当它挂起时,它是这样做的,因为一些线程正在等待OpenMP障碍:

When it hangs, it does so because some threads are waiting at the OpenMP barrier:

Thread 2 (Thread 0x7fe4325a5700 (LWP 39839)):
#0  0x00007fe44d3c2084 in gomp_team_barrier_wait_end () from /usr/lib64/gcc-4.9.2/lib64/gcc/x86_64-redhat-linux-gnu/4.9.2/libgomp.so.1
#1  0x00007fe44d3bf8c2 in gomp_thread_start () at ../.././libgomp/team.c:118
#2  0x0000003f64607851 in start_thread () from /lib64/libpthread.so.0
#3  0x0000003f642e890d in clone () from /lib64/libc.so.6

其他线程卡在Armadillo内:

And the other threads are stuck inside Armadillo:

Thread 1 (Thread 0x7fe44afe2e60 (LWP 39800)):
#0  0x0000003ee541f748 in dscal_ () from /usr/lib64/libblas.so.3
#1  0x00007fe44c0d3666 in dlarfp_ () from /usr/lib64/atlas/liblapack.so.3
#2  0x00007fe44c058736 in dgelq2_ () from /usr/lib64/atlas/liblapack.so.3
#3  0x00007fe44c058ad9 in dgelqf_ () from /usr/lib64/atlas/liblapack.so.3
#4  0x00007fe44c059a32 in dgels_ () from /usr/lib64/atlas/liblapack.so.3
#5  0x00007fe44f09fb3d in bool arma::auxlib::solve_ud<double, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times> >(arma::Mat<double>&, arma::Mat<double>&, arma::Base<double, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times> > const&) () at /usr/include/armadillo_bits/lapack_wrapper.hpp:677
#6  0x00007fe44f0a0f87 in arma::Col<double>::Col<arma::Glue<arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times>, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times>, arma::glue_solve> >(arma::Base<double, arma::Glue<arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times>, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times>, arma::glue_solve> > const&) ()
at /usr/include/armadillo_bits/glue_solve_meat.hpp:39

正如你从stacktrace可以看到的,我的版本的Armadillo使用图集。根据这个文档,atlas似乎是线程安全的: ftp://lsec.cc.ac。 cn / netlib / atlas / faq.html#tsafe

As you can see from the stacktrace my version of Armadillo uses atlas. And according to this documentation atlas seems to be thread safe: ftp://lsec.cc.ac.cn/netlib/atlas/faq.html#tsafe

更新2015年9月11日

根据Vladimir F的建议,我终于有时间运行更多的测试。

I finally got some time to run more tests, based on the suggestions of Vladimir F.

当我用ATLAS的BLAS编译armadillo时,仍然能够再现然后挂起和NaNs。当它挂起时,stacktrace中唯一改变的是对BLAS的调用:

When I compile armadillo with ATLAS's BLAS, I'm still able to reproduce then hangs and the NaNs. When it hangs, the only thing that changes in the stacktrace is the call to BLAS:

#0  0x0000003fa8054718 in ATL_dscal_xp1yp0aXbX@plt () from /usr/lib64/atlas/libatlas.so.3
#1  0x0000003fb05e7666 in dlarfp_ () from /usr/lib64/atlas/liblapack.so.3
#2  0x0000003fb0576a61 in dgeqr2_ () from /usr/lib64/atlas/liblapack.so.3
#3  0x0000003fb0576e06 in dgeqrf_ () from /usr/lib64/atlas/liblapack.so.3
#4  0x0000003fb056d7d1 in dgels_ () from /usr/lib64/atlas/liblapack.so.3
#5  0x00007ff8f3de4c34 in void arma::lapack::gels<double>(char*, int*, int*, int*, double*, int*, double*, int*, double*, int*, int*) () at /usr/include/armadillo_bits/lapack_wrapper.hpp:677
#6  0x00007ff8f3de1787 in bool arma::auxlib::solve_od<double, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times> >(arma::Mat<double>&, arma::Mat<double>&, arma::Base<double, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times> > const&) () at /usr/include/armadillo_bits/auxlib_meat.hpp:3434

编译没有ATLAS,只有netlib BLAS和LAPACK,我能够重现NaN而不是挂起。

Compiling without ATLAS, only with netlib BLAS and LAPACK, I was able to reproduce the NaNs but not the hangs.

在这两种情况下, c $ c> solve()与 #pragma omp critical我根本没有问题

In both cases, surrounding solve() with #pragma omp critical I have no problems at all

推荐答案

您确定您的系统已过期吗? solve_ud 在堆栈跟踪中说不然。虽然你也有 solve_od ,可能这与问题无关。

Are you sure your systems are over determined? solve_ud in your stack trace says otherwise. Though you have solve_od too, and probably that's nothing to do with the issue. But it doesn't hurt to find why that's happening and fix it if you think the systems should be od.


是armadillo的解决方法,但是如果你认为系统应该是od,线程安全吗?

Is armadillo solve() thread safe?

我认为这取决于你的搭载版本,另见。查看<$ c $的代码 c> solve_od 所有访问的变量似乎都是本地的。请注意代码中的警告:

That I think depends on your lapack version, also see this. Looking at the code of solve_od all the variables accessed seem to be local. Note the warning in the code:


注意:ATLAS 3.6
提供的lapack库中的dgels有问题

NOTE: the dgels() function in the lapack library supplied by ATLAS 3.6 seems to have problems

因此似乎只有 lapack :: gels 您。如果不可能修复lapack,一个解决方法是堆栈系统和解决一个单一的大系统。如果你的个人系统很小,这可能会更有效。

Thus it seems only lapack::gels can cause trouble for you. If fixing lapack is not possible, a workaround is to stack your systems and solve a single large system. That probably would be even more efficient if your individual systems are small.

这篇关于是armadillo solve()线程安全吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆