Rfast hd.eigen()返回NA,但基本eigen()不返回 [英] Rfast hd.eigen() returns NAs but base eigen() does not

查看:372
本文介绍了Rfast hd.eigen()返回NA,但基本eigen()不返回的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Rfast中的hd.eigen时遇到问题.对于大多数数据,它给出的结果与eigen极为接近,但是hd.eign有时会返回空的$vector,NA或其他不良结果. 例如:

I am having problems with hd.eigen in Rfast. It gives extremely close results to eigen with most data, but sometimes hd.eign returns an empty $vector, NAs, or other undesirable results. For example:

> set.seed(123)
> bigm <- matrix(rnorm(2000*2000,mean=0,sd = 3), 2000, 2000)
> 
> e3 = eigen(bigm)
> length(e3$values)
[1] 2000
> length(e3$vectors)
[1] 4000000
> sum(is.na(e3$vectors) == TRUE)
[1] 0
> sum(is.na(e3$vectors) == FALSE)
[1] 4000000
> 
> e4 = hd.eigen(bigm, vectors = TRUE)
> length(e4$values)
[1] 2000
> length(e4$vectors)
[1] 4000000
> sum(is.na(e4$vectors) == TRUE)
[1] 2000
> sum(is.na(e4$vectors) == FALSE)
[1] 3998000

除了它破坏了我的脚本之外,这些NA是否还表明我的数据存在更深层次的问题?还是hd.eig无法处理股票eigen()可以处理的某些情况?一个比另一个好吗?

Other than the fact that it breaks my script, do these NAs indicate a deeper problem with my data? Or is hd.eig not able to handle some situations that the stock eigen() can? Is one better than the other?

根据Ralf的建议,我检查了我的BLAS版本,似乎R可能正在寻找错误的版本/在错误的位置:

As per Ralf's suggestion, I checked my BLAS versions, and it does seem like maybe R is looking for the wrong version/in the wrong place:

~ $ ldd /usr/lib64/R/bin/exec/R
        linux-vdso.so.1 (0x00007ffeec3b9000)
        libR.so => not found
        libRblas.so => not found
        libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007feb27ef2000)
        libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007feb27ecf000)
        libc.so.6 => /usr/lib64/libc.so.6 (0x00007feb27cdb000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007feb27f7b000)

此外,我不清楚openBLAS是否等效于其他发行版中默认安装的BLAS.

Also, I am unclear on whether openBLAS is equivalent to the BLAS that is installed by default in other distros.

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-generic-linux-gnu (64-bit)
Running under: Clear Linux OS

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblas_nehalemp-r0.3.6.so

edit 2:我在基于CentOS的HPC系统上尝试了相同的示例,但没有得到任何NA.在那里,sessionInfo()显示:

edit 2: I tried the same example on a CentOS-based HPC system, and did not get any NA's. There, sessionInfo() reveals:

BLAS/LAPACK: /hpc/packages/minerva-centos7/intel/parallel_studio_xe_2019/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

hd.eign中产生NA的表达式是

vectors <- tcrossprod(y, t(FF) * L^(-0.5))

具体来说,L^(-0.5)在索引2000处生成NaN

specifically, L^(-0.5) produces NaN at index 2000

> L[2000]
[1] -1.136237e-12

但是,在没有返回NA的两台计算机上,L [2000]为正(尽管略有不同,在HPC系统上是5.822884e-14,在运行Microsoft R的Windows计算机上是3.022511e-12.

However, on the two machines where no NAs are returned L[2000] is positive (although slightly different, 5.822884e-14 on the HPC system and 3.022511e-12 on my Windows machine running Microsoft build of R.

差异似乎起源于基本的eigen()函数,该函数从问题计算机上的crossprod()矩阵xx返回一个负值,而不是其他两个.我保存了xx对象并在计算机之间打开,所以我知道eigen()的输入是完全相同的.

Edit 4: The difference appears to originate in the base eigen() function, which returns one negative value from the crossprod() matrix xx on the problem machine but not not the other two. I saved the xx object and opened between computers, so I know that the input to eigen() was exactly the same.

我更深入地研究了一个层次,发现原始负值来自eigen()

Edit 5: I drilled one level deeper and found that the original negative value comes from this statement in eigen()

    z <- if (!complex.x) 
      .Internal(La_rs(x, only.values))
    else .Internal(La_rs_cmplx(x, only.values))

如果我另存为CSV,然后重新打开,则问题计算机不会产生负特征值.

Edit 6: If I save as a CSV and then re-open, the problem computer does not produce negative eigenvalues.

> load("/home/james/nfs-cloud/PanosLab/CircRNA/input_to_La_rs.Rdata")
> r <- .Internal(La_rs(as.matrix(x), only.values = FALSE))
> sum(r$values < 0)
[1] 1
> write.csv(x, "test_for_internal.csv", row.names = FALSE)
> x <- read.csv("test_for_internal.csv")
> r <- .Internal(La_rs(as.matrix(x), only.values = FALSE))
> sum(r$values < 0)
[1] 0

这能给任何人一个线索吗?这是一个错误吗?

Does that give anyone a clue? Is this a bug?

推荐答案

Rfast中的hd.eigen函数仅针对n小于p的情况而设计.

The hd.eigen function in Rfast is designed for the case of n smaller than p only.

这篇关于Rfast hd.eigen()返回NA,但基本eigen()不返回的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆