如何在教堂中添加稀疏域 [英] How to append a sparse domain in Chapel

查看:106
本文介绍了如何在教堂中添加稀疏域的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用在CSV上读取的循环在Chapel中填充稀疏数组.

I'm populating a sparse array in Chapel with a loop that is reading over a CSV.

我想知道什么是最好的模式.

I'm wondering what the best pattern is.

var dnsDom = {1..n_dims, 1..n_dims};
var spsDom: sparse subdomain(dnsDom);
for line in file_reader.lines() {
   var i = line[1]:int;
   var j = line[2]:int;
   spsDom += (i,j);
}

这是一种有效的方法吗?
我是否应该创建一个临时元组数组,并每(比如说)10,000行追加一个spsDom?

Is this an efficient way of doing it?
Should I create a temporary array of tuples and append spsDom every ( say ) 10,000 rows?

谢谢!

推荐答案

在代码段中显示的方式将在每次+=操作时扩展稀疏域的内部数组.如您所建议;以某种方式缓冲读取的索引,然后将它们批量添加绝对会由于添加索引数组的几种优化而表现更好.

The way you show in the snippet will expand the internal arrays of the sparse domain at every += operation. As you suggested; somehow buffering the read indices, then adding them in bulk will definitely perform better due to several optimizations for adding an array of indices.

您可以类似地执行+=,其中右侧是数组:

You can similarly do a += where the right-hand side is an array:

spsDom += arrayOfIndices;

稀疏域上+=运算符的重载实际上是在调用主要的批量添加方法bulkAdd.该方法本身具有几个标志,在某些情况下可以帮助您获得更高的性能.请注意,+=重载以可能的最安全"方式调用bulkAdd方法.也就是说,索引数组可以是随机的,可以包含重复项等.如果您有数组(在这种情况下,从文件中读取的索引)满足某些要求(它们是有序的吗?是否存在重复项?是否需要保留输入内容数组?),您可以直接使用bulkAdd并传递几个优化标志.

This overload of += operator on sparse domains is actually calling the main bulk addition method bulkAdd. The method itself has several flags that may help you gain even more performance in some cases. Note that += overload calls the bulkAdd method in the "safest" manner possible. i.e. the array of indices can be in random order, can include duplicates etc. If you have arrays (in your cases indices you read from the file) satisfy some requirements (Are they ordered? Are there duplicates? Do you need to preserve the input array?), you can use bulkAdd directly and pass several optimization flags.

请参见 http://chapel.cray. com/docs/latest/builtins/internal/ChapelArray.html#ChapelArray.bulkAdd 获取bulkAdd的文档.

编辑:在相关代码段的基础上构建的代码段:

A snippet building on top of the one in question:

var dnsDom = {1..n_dims, 1..n_dims};
var spsDom: sparse subdomain(dnsDom);

//create an index buffer
config const indexBufferSize = 100;
var indexBufferDom: {0..#indexBufferSize};
var indexBuffer: [indexBufferDom] 2*int;

var count = 0;
for line in file_reader.lines() {

  indexBuffer[count] = (line[1]:int, line[2]:int);
  count += 1;

  // bulk add indices if the buffer is full
  if count == indexBufferSize {
    spsDom.bulkAdd(indexBuffer, dataSorted=true,
                                preserveInds=false,
                                isUnique=true);
    count = 0;
  }
}

// dump the final buffer that is (most likely) partially filled
spsDom.bulkAdd(indexBuffer[0..#count],  dataSorted=true,
                                        preserveInds=false,
                                        isUnique=true);

我还没有测试过,但是我认为这应该可以捕捉到基本思想.传递给bulkAdd的标志应该可以达到最佳性能.当然,这取决于输入缓冲区是否已排序且没有重复项.另外,请注意,初始的bulkAdd与连续的add相比要快得多.而且,由于该方法需要筛选现有索引并在必要时进行移动,因此它们可能会变慢.因此,更大的缓冲区可以提供更好的性能.

I haven't tested it but I think this should capture the basic idea.The flags passed to the bulkAdd should result in the best performance. Of course, this depends on the input buffer being sorted and not having any duplicates. Also, note that the initial bulkAdd will be much faster compared to consecutive ones. And they will probably get slower as the method needs to sift through the existing indices and shift them if necessary. So a larger buffer can deliver better performance.

这篇关于如何在教堂中添加稀疏域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆