在Mathematica中对稀疏数组的有效替代(Outer)吗? [英] Efficient alternative to Outer on sparse arrays in Mathematica?

查看:113
本文介绍了在Mathematica中对稀疏数组的有效替代(Outer)吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有两个非常大的列表{a1,a2,…}和{b1,b2,…},其中所有ai和bj都是大型稀疏数组.为了提高内存效率,我将每个列表存储为一个综合的稀疏数组.

Suppose I have two very large lists {a1, a2, …} and {b1, b2, …} where all ai and bj are large sparse arrays. For the sake of memory efficiency I store each list as one comprehensive sparse array.

现在,我想在ai和bj的所有可能对上计算一些函数f,其中每个结果f [ai,bj]再次为稀疏数组.顺便说一下,所有这些稀疏数组都具有相同的维数.

Now I would like to compute some function f on all possible pairs of ai and bj where each result f[ai, bj] is a sparse array again. All these sparse arrays have the same dimensions, by the way.

Flatten[Outer[f, {a1, a2, ...}, {b1, b2, ...}, 1], 1]

返回所需的结果(原则上),它似乎消耗了过多的内存.尤其重要,因为返回值是稀疏数组的列表,而在我感兴趣的情况下,一个综合的稀疏数组的效率要高得多.

returns the desired result (in principle) it appears to consume excessive amounts of memory. Not the least because the return value is a list of sparse arrays whereas one comprehensive sparse array turns out much more efficient in my cases of interest.

上述Outer的使用是否有有效的替代方法?

Is there an efficient alternative to the above use of Outer?

更具体的示例:

{SparseArray[{{1, 1, 1, 1} -> 1, {2, 2, 2, 2} -> 1}],
 SparseArray[{{1, 1, 1, 2} -> 1, {2, 2, 2, 1} -> 1}],
 SparseArray[{{1, 1, 2, 1} -> 1, {2, 2, 1, 2} -> 1}],
 SparseArray[{{1, 1, 2, 2} -> -1, {2, 2, 1, 1} -> 1}],
 SparseArray[{{1, 2, 1, 1} -> 1, {2, 1, 2, 2} -> 1}],
 SparseArray[{{1, 2, 1, 2} -> 1, {2, 1, 2, 1} -> 1}],
 SparseArray[{{1, 2, 2, 1} -> -1, {2, 1, 1, 2} -> 1}],
 SparseArray[{{1, 2, 2, 2} -> 1, {2, 1, 1, 1} -> 1}]};
ByteCount[%]

list = SparseArray[%%]
ByteCount[%]

Flatten[Outer[Dot, list, list, 1], 1];
ByteCount[%]
list1x2 = SparseArray[%%]
ByteCount[%]

Flatten[Outer[Dot, list1x2, list, 1], 1];
ByteCount[%]
list1x3 = SparseArray[%%]
ByteCount[%]

Outer(稀疏数组的列表)的原始中间结果不仅效率极低,而且Outer似乎在计算过程中也会消耗太多内存.

etc. Not only are the raw intermediate results of Outer (lists of sparse arrays) extremely inefficient, Outer seems to consume way too much memory during the computation itself, too.

推荐答案

我将提出一种解决方案,该解决方案相当复杂,但在计算过程中只允许使用大约两倍的内存.将最终结果存储为SparseArray.为此付出的代价将是执行速度大大降低.

I will propose a solution which is rather complex but allows one to only use about twice as much memory during the computation as is needed to store the final result as a SparseArray. The price to pay for this will be a much slower execution.

这是代码.首先,略微修改(以解决高维稀疏数组)的稀疏数组构造-解构API,取自

Here is the code. First, a slightly modified (to address higher-dimensional sparse arrays) sparse array construction - deconstruction API, taken from this answer:

ClearAll[spart, getIC, getJR, getSparseData, getDefaultElement, 
  makeSparseArray];
HoldPattern[spart[SparseArray[s___], p_]] := {s}[[p]];
getIC[s_SparseArray] := spart[s, 4][[2, 1]];
getJR[s_SparseArray] := spart[s, 4][[2, 2]];
getSparseData[s_SparseArray] := spart[s, 4][[3]];
getDefaultElement[s_SparseArray] := spart[s, 3];
makeSparseArray[dims_List, jc_List, ir_List, data_List, defElem_: 0] :=
   SparseArray @@ {Automatic, dims, defElem, {1, {jc, ir}, data}};

迭代器

以下函数产生迭代器.迭代器是封装迭代过程的好方法.

Iterators

The following functions produce iterators. Iterators are a good way to encapsulate the iteration process.

ClearAll[makeTwoListIterator];
makeTwoListIterator[fname_Symbol, a_List, b_List] :=
  With[{indices = Flatten[Outer[List, a, b, 1], 1]},
   With[{len  = Length[indices]},
    Module[{i = 0},
      ClearAll[fname];
      fname[] := With[{ind = ++i}, indices[[ind]] /; ind <= len];
      fname[] := Null;
      fname[n_] := 
        With[{ind = i + 1}, i += n; 
           indices[[ind ;; Min[len, ind + n - 1]]] /; ind <= len];
      fname[n_] := Null;
 ]]];

请注意,我可以将以上功能实现为更多的内存-高效地使用而不在其中使用Outer,但是出于我们的目的,这不是主要问题.

Note that I could have implemented the above function more memory - efficiently and not use Outer in it, but for our purposes this won't be the major concern.

这是一个更专业的版本,它为二维索引对生成插入器.

Here is a more specialized version, which produces interators for pairs of 2-dimensional indices.

ClearAll[make2DIndexInterator];
make2DIndexInterator[fname_Symbol, i : {iStart_, iEnd_}, j : {jStart_, jEnd_}] :=
   makeTwoListIterator[fname, Range @@ i, Range @@ j];
 make2DIndexInterator[fname_Symbol, ilen_Integer, jlen_Integer] :=
   make2DIndexInterator[fname, {1, ilen}, {1, jlen}];

这是它的工作原理:

In[14]:= 
makeTwoListIterator[next,{a,b,c},{d,e}];
next[]
next[]
next[]

Out[15]= {a,d}
Out[16]= {a,e}
Out[17]= {b,d}

我们还可以使用它来获取批处理结果:

We can also use this to get batch results:

In[18]:= 
makeTwoListIterator[next,{a,b,c},{d,e}];
next[2]
next[2]

Out[19]= {{a,d},{a,e}}
Out[20]= {{b,d},{b,e}}

,我们将使用第二种形式.

, and we will be using this second form.

此函数将通过获取数据块(也以SparseArray形式)并将它们粘合在一起来迭代地构建SparseArray对象.它基本上是答案中使用的代码,打包成一个功能.它接受用于产生下一个数据块的代码段,并包裹在Hold中(我也可以将其打包为HoldAll)

This function will build a SparseArray object iteratively, by getting chunks of data (also in SparseArray form) and gluing them together. It is basically code used in this answer, packaged into a function. It accepts the code piece used to produce the next chunk of data, wrapped in Hold (I could alternatively make it HoldAll)

Clear[accumulateSparseArray];
accumulateSparseArray[Hold[getDataChunkCode_]] :=
  Module[{start, ic, jr, sparseData, dims, dataChunk},
   start = getDataChunkCode;
   ic = getIC[start];
   jr = getJR[start];
   sparseData = getSparseData[start];
   dims = Dimensions[start];
   While[True, dataChunk = getDataChunkCode;
     If[dataChunk === {}, Break[]];
     ic = Join[ic, Rest@getIC[dataChunk] + Last@ic];
     jr = Join[jr, getJR[dataChunk]];
     sparseData = Join[sparseData, getSparseData[dataChunk]];
     dims[[1]] += First[Dimensions[dataChunk]];
   ];
   makeSparseArray[dims, ic, jr, sparseData]];

将它们放在一起

此功能是主要功能,将所有功能组合在一起:

Putting it all together

This function is the main one, putting it all together:

ClearAll[sparseArrayOuter];
sparseArrayOuter[f_, a_SparseArray, b_SparseArray, chunkSize_: 100] := 
  Module[{next, wrapperF, getDataChunkCode},
    make2DIndexInterator[next, Length@a, Length@b];
    wrapperF[x_List, y_List] := SparseArray[f @@@ Transpose[{x, y}]];
    getDataChunkCode :=
      With[{inds = next[chunkSize]},
         If[inds === Null, Return[{}]];
         wrapperF[a[[#]] & /@ inds[[All, 1]], b[[#]] & /@ inds[[All, -1]]]
      ];
    accumulateSparseArray[Hold[getDataChunkCode]]
  ];

在这里,我们首先生成迭代器,该迭代器将根据需要提供索引对列表的一部分,用于提取元素(也为SparseArrays).请注意,我们通常一次从两个大输入SparseArray -s中提取一对以上的元素,以加快代码的速度.我们一次处理的对数由可选的chunkSize参数控制,该参数默认为100.然后,我们构造代码来处理这些元素,然后将结果放回SparseArray中,在这里我们使用辅助功能wrapperF.迭代器的使用不是绝对必要的(可以使用Reap-Sow,以及其他答案),但允许我将迭代逻辑与稀疏数组的一般累加逻辑脱钩.

Here, we first produce the iterator which will give us on demand portions of index pair list, used to extract the elements (also SparseArrays). Note that we will generally extract more than one pair of elements from two large input SparseArray-s at a time, to speed up the code. How many pairs we process at once is governed by the optional chunkSize parameter, which defaults to 100. We then construct the code to process these elements and put the result back into SparseArray, where we use an auxiliary function wrapperF. The use of iterators wasn't absolutely necessary (could use Reap-Sow instead, as with other answers), but allowed me to decouple the logic of iteration from the logic of generic accumulation of sparse arrays.

首先,我们准备大型稀疏数组并测试我们的功能:

First we prepare large sparse arrays and test our functionality:

In[49]:= 
arr = {SparseArray[{{1,1,1,1}->1,{2,2,2,2}->1}],SparseArray[{{1,1,1,2}->1,{2,2,2,1}->1}],
SparseArray[{{1,1,2,1}->1,{2,2,1,2}->1}],SparseArray[{{1,1,2,2}->-1,{2,2,1,1}->1}],
SparseArray[{{1,2,1,1}->1,{2,1,2,2}->1}],SparseArray[{{1,2,1,2}->1,{2,1,2,1}->1}]};

In[50]:= list=SparseArray[arr]
Out[50]= SparseArray[<12>,{6,2,2,2,2}]

In[51]:= larger = sparseArrayOuter[Dot,list,list]
Out[51]= SparseArray[<72>,{36,2,2,2,2,2,2}]

In[52]:= (large= sparseArrayOuter[Dot,larger,larger])//Timing
Out[52]= {0.047,SparseArray[<2592>,{1296,2,2,2,2,2,2,2,2,2,2}]}

In[53]:= SparseArray[Flatten[Outer[Dot,larger,larger,1],1]]==large
Out[53]= True

In[54]:= MaxMemoryUsed[]
Out[54]= 21347336

现在我们进行功率测试

In[55]:= (huge= sparseArrayOuter[Dot,large,large,2000])//Timing
Out[55]= {114.344,SparseArray[<3359232>,{1679616,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2}]}

In[56]:= MaxMemoryUsed[]
Out[56]= 536941120

In[57]:= ByteCount[huge]
Out[57]= 262021120

In[58]:= (huge1 = Flatten[Outer[Dot,large,large,1],1]);//Timing
Out[58]= {8.687,Null}

In[59]:= MaxMemoryUsed[]
Out[59]= 2527281392

对于此特定示例,建议的方法的内存效率是直接使用Outer的5倍,但要慢15倍.我必须调整chunksize参数(默认值为100,但是对于上述情况,我使用了2000以获得最佳速度/内存使用组合).我的方法仅将峰值用作存储最终结果所需内存的两倍.与基于Outer的方法相比,内存节省的程度将取决于所讨论的稀疏数组.

For this particular example, the suggested method is 5 times more memory-efficient than the direct use of Outer, but about 15 times slower. I had to tweak the chunksize parameter (default is 100, but for the above I used 2000, to get the optimal speed / memory use combination). My method only used as a peak value twice as much memory as needed to store the final result. The degree of memory-savings as compared to Outer- based method will depend on the sparse arrays in question.

这篇关于在Mathematica中对稀疏数组的有效替代(Outer)吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆