对于R中的大型矩阵,如何有效地计算所有可能组合中的归一化比率? [英] How to calculate normalized ratios in all possible combinations efficiently for a large matrix in R?

查看:91
本文介绍了对于R中的大型矩阵,如何有效地计算所有可能组合中的归一化比率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为R中的一个大矩阵有效地计算所有可能组合中的归一化比率。我早些时候曾问过类似的问题在此处,并提供少量数据,并且那里提供的解决方案效果很好。但是,当我尝试对大型数据集(400 x 2151)应用相同的解决方案时,我的系统挂起了。我的系统的Intel i7处理器具有16 GB RAM。这是带有数据的代码

I want to calculate normalised ratios in all possible combinations efficiently for a large matrix in R. I have asked a similar question earlier here and with a small data and the solutions provided there worked fine. But when I am trying to apply the same solution for a large dataset (400 x 2151), my system is getting hang. My system is having 16 GB RAM with Intel i7 processer. Here is the code with data

df <- matrix(rexp(860400), nrow = 400, ncol = 2151)

@Ronak Shah提供的解决方案

Solution provided by @Ronak Shah

cols <- 1:ncol(df)
temp <- expand.grid(cols, cols)
new_data <- (df[,temp[,2]] - df[,temp[,1]])/(df[,temp[,2]] + df[,temp[,1]])

@akrun

f1 <- function(i, j) (df[, i] - df[, j])/(df[, i] + df[, j])
out <- outer(seq_along(df), seq_along(df), FUN = f1)
colnames(out) <- outer(names(df), names(df), paste, sep = "_")

两个解决方案都取一个时间过长,系统开始挂起。那么,我该如何有效地做到这一点呢?

Both the solutions taking a very long time and the system is getting hang. So, how can I efficiently do it?

推荐答案

既然内存似乎是您的主要问题,那么如何使用迭代器呢?使用软件包 RcppAlgos * ,我们可以使用 permuteIter 一次计算您的比率 N

Since memory seems to be your main issue, how about using iterators? Using the package RcppAlgos*, we can make use of permuteIter to calculate your ratios N at a time.

如果必须有名称,我们需要一个附加的迭代器。这意味着您必须使2个迭代器保持同步,这可能会变得乏味。幸运的是,使用 permuteIter summary()方法,我们始终可以看到当前索引是什么,并使用多种选择(例如随机访问 [[ front() back( ) startOver())。

If one must have names, we need an additional iterator. This means you must keep 2 iterators in sync, which can become tedious. Fortunately, with the summary() methods of permuteIter, we can always see what the current index is and reset them with a wide range of options (e.g. random access [[, front(), back(), or startOver()).

library(RcppAlgos)
df <- matrix(rexp(860400), nrow = 400, ncol = 2151)

ratioIter <- permuteIter(ncol(df), 2, FUN = function(x) {
    (df[, x[2]] - df[, x[1]]) / (df[, x[2]] + df[, x[1]])
})

## if you really want to name your output, you must have
## an additional name iterator... not very elegant
nameIter <- permuteIter(paste0("col", 1:ncol(df1)), 2, FUN = function(x) {
    paste0(rev(x), collapse = "_")
})

firstIter <- matrix(ratioIter$nextIter(), ncol = 1)
firstName <- nameIter$nextIter()
colnames(firstIter) <- firstName

head(firstIter)
      col2_col1
[1,]  0.2990054
[2,] -0.9808111
[3,] -0.9041054
[4,]  0.7970873
[5,]  0.8625776
[6,]  0.2768359

## returns a list, so we call do.call(cbind
next5Iter <- do.call(cbind, ratioIter$nextNIter(5))
next5Names <- unlist(nameIter$nextNIter(5))
colnames(next5Iter) <- next5Names

head(next5Iter)
       col3_col1  col4_col1   col5_col1  col6_col1  col7_col1
[1,] -0.28099710  0.1665687  0.40565958 -0.7524038 -0.7132844
[2,] -0.81434900 -0.4283759 -0.89811556 -0.8462906 -0.5399741
[3,] -0.02289368  0.4285012  0.05087853 -0.5091659 -0.2328995
[4,] -0.06825458  0.3126928  0.68968843 -0.2180618  0.6651785
[5,]  0.33508319  0.7389108  0.84733425  0.9065263  0.8977107
[6,]  0.61773589  0.3443120  0.61084584  0.5727938  0.3888807

您应该注意,这不会显示 i == j 的结果(这些给出 NaN )。因此总数不超过2151 2 (实际上,它等于 2151 ^ 2-2151 )。

You should note that this does not show results where i == j (these give NaN). So the total number is just under 21512 (In fact it is exactly equal to 2151^2 - 2151).

ratioIter$summary()
$description
[1] "Permutations of 2151 choose 2"

$currentIndex
[1] 6

$totalResults
[1] 4624650

$totalRemaining
[1] 4624644

甚至还有随机访问和以前的迭代器:

There are even random access and previous iterators as well:

## Get the last ratio
lastIter <- ratioIter$back()
lastName <- nameIter$back()
mLast <- matrix(lastIter, ncol = 1)
colnames(mLast) <- lastName

head(mLast)
     col2150_col2151
[1,]      -0.6131926
[2,]       0.9936783
[3,]       0.1373538
[4,]       0.1014347
[5,]      -0.5061608
[6,]       0.5773503

## iterate backwards with the previous methods
prev5Iter <- do.call(cbind, ratioIter$prevNIter(5))
prev5Names <- unlist(nameIter$prevNIter(5))
colnames(prev5Iter) <- prev5Names

head(prev5Iter)
     col2149_col2151 col2148_col2151 col2147_col2151 col2146_col2151 col2145_col2151
[1,]     -0.75500069     -0.72757136     -0.94457988     -0.82858884     -0.25398782
[2,]      0.99696694      0.99674084      0.99778638      0.99826472      0.95738947
[3,]      0.27701596      0.45696010      0.00682574      0.01529448     -0.62368764
[4,]     -0.09508689     -0.90698165     -0.38221934     -0.41405984      0.01371556
[5,]     -0.31580709     -0.06561386     -0.07435058     -0.08033145     -0.90692881
[6,]      0.82697720      0.86858595      0.81707206      0.75627297      0.46272349

## Get a random sample
set.seed(123)
randomIter <- do.call(cbind, ratioIter[[sample(4624650, 5)]])

## We must reset the seed in order to get the same output for the names
set.seed(123)
randomNames <- unlist(nameIter[[sample(4624650, 5)]])
colnames(randomIter) <- randomNames

head(randomIter)
     col1044_col939 col20_col1552 col412_col2014 col1751_col1521 col337_col1295
[1,]     -0.3902066     0.4482747   -0.108018200      -0.1662857     -0.3822436
[2,]     -0.2358101     0.9266657   -0.657135882       0.0671608     -0.6821823
[3,]     -0.7054217     0.8944720    0.092363665       0.2667708      0.1908249
[4,]     -0.1574657     0.2775225   -0.221737223       0.3381454     -0.5705021
[5,]     -0.4282909    -0.4406433    0.092783086      -0.7506674     -0.1276932
[6,]      0.9998189    -0.2497586   -0.009375891       0.7071864     -0.2425258

最后,它是用 C ++ 编写的,因此速度非常快:

Lastly, it is written in C++ so it is very fast:

system.time(ratioIter$nextNIter(1e3))
#  user  system elapsed 
#     0       0       0

* 我是 RcppAlgos

这篇关于对于R中的大型矩阵,如何有效地计算所有可能组合中的归一化比率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆