与“非常非常"一起工作大数组 [英] Working with "very very" large arrays

查看:12
本文介绍了与“非常非常"一起工作大数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要处理非常大的小类型数组(int 或 float 数组),我只在具有大量内存的机器上针对 X64,物理内存在我的场景中从来都不是问题.在查看 gcAllowVeryLargeObjects 的文档时,我注意到了这一点:

•对于字节数组和单字节结构数组,任何单个维度的最大索引为 2,147,483,591 (0x7FFFFFC7),其他类型为 2,146,435,071 (0X7FEFFFFF).

现在我的问题是我实际上需要"使用比这更大的数组,这里合适的解决方法是什么?创建数组数组或其他抽象?

知道我主要需要按顺序访问这些数组(从不随机读取,但通常不同的段被不同的线程按顺序读取,可能一次有 100 多个线程)我最好的选择是什么?

我可能需要保存多达 65 536 000 000 个或更多元素的数组.

解决方案

如果您确实必须打破数组长度限制,那么您必须将数组拆分为合适大小的块.您可以将这些块包装在具有适当语义的容器中,例如 BigArrayOfLong 对象,James McCaffrey 不久前在博客上写过.类似的还有很多.

基本思想是使用锯齿状数组来分配要使用的空间.请注意,多维数组不会给您任何优势,因为它仍然是单个对象,而锯齿状数组是较小的数组数组,每个数组都是(可能不是连续的)内存中的自己的对象.

这是一个非常的简单(并不是特别优化)的实现:

公共类 HugeArray:IEnumerable其中 T : 结构{public static int arysize = (Int32.MaxValue>>4)/Marshal.SizeOf();公共只读长容量;私有只读 T[][] 内容;public T this[长索引]{得到{if (index < 0 || index >=容量)抛出新的 IndexOutOfRangeException();int chunk = (int)(index/arysize);int offset = (int)(index % arysize);返回内容[块][偏移];}放{if (index < 0 || index >=容量)抛出新的 IndexOutOfRangeException();int chunk = (int)(index/arysize);int offset = (int)(index % arysize);内容[块][偏移] = 值;}}公共 HugeArray(长容量){容量=容量;int nChunks = (int)(容量/arysize);int nRemainder = (int)(容量 % arysize);if (nRemainder == 0)内容 = 新 T[nChunks][];别的内容 = 新 T[nChunks + 1][];for (int i = 0; i  0)content[content.Length - 1] = new T[nRemainder];}公共 IEnumerator获取枚举器(){返回 content.SelectMany(c => c).GetEnumerator();}IEnumerator System.Collections.IEnumerable.GetEnumerator() { return GetEnumerator();}}

这个是静态分配的,但是制作一个满足需求的增长并不难.只需确保您指定的块大小没有完全超出范围.为了以防万一,我已经根据项目大小进行了计算.

I need to work with very large arrays of small types (int or float arrays), i'm targeting X64 only on machines with a lot of ram, physical memory is never the issue in my scenarios. While looking at the doc for gcAllowVeryLargeObjects i noticed this point :

•The maximum index in any single dimension is 2,147,483,591 (0x7FFFFFC7) for byte arrays and arrays of single-byte structures, and 2,146,435,071 (0X7FEFFFFF) for other types.

Now my issue is i actually "need" to work with larger arrays than that, what would be the appropriate workaround here? creating arrays of arrays or other abstractions?

Knowing i mostly need to access those arrays sequencially (never random reads, but often diferent segments getting read sequencially by diferent threads, potentially 100+ threads at once) what would my best bet be?

I may need to hold arrays of up to 65 536 000 000 elements or more.

解决方案

If you really must break the array length limit then you'll have to split the array into chunks of suitable size. You can wrap those chunks together in a container that has the appropriate semantics, like the BigArrayOfLong object that James McCaffrey blogged about a while back. There are numerous others like it.

The basic idea is you use a jagged array to allocate the space you're going to use. Note that a multi-dimensional array won't give you any advantage since it is still a single object, while a jagged array is a smaller array of arrays, each of which is its own object in (probably not contiguous) memory.

Here's a very simple (and not particular optimal) implementation:

public class HugeArray<T> : IEnumerable<T>
    where T : struct
{
    public static int arysize = (Int32.MaxValue >> 4) / Marshal.SizeOf<T>();

    public readonly long Capacity;
    private readonly T[][] content;

    public T this[long index]
    {
        get
        {
            if (index < 0 || index >= Capacity)
                throw new IndexOutOfRangeException();
            int chunk = (int)(index / arysize);
            int offset = (int)(index % arysize);
            return content[chunk][offset];
        }
        set
        {
            if (index < 0 || index >= Capacity)
                throw new IndexOutOfRangeException();
            int chunk = (int)(index / arysize);
            int offset = (int)(index % arysize);
            content[chunk][offset] = value;
        }
    }

    public HugeArray(long capacity)
    {
        Capacity = capacity;
        int nChunks = (int)(capacity / arysize);
        int nRemainder = (int)(capacity % arysize);

        if (nRemainder == 0)
            content = new T[nChunks][];
        else
            content = new T[nChunks + 1][];

        for (int i = 0; i < nChunks; i++)
            content[i] = new T[arysize];
        if (nRemainder > 0)
            content[content.Length - 1] = new T[nRemainder];
    }

    public IEnumerator<T> GetEnumerator()
    {
        return content.SelectMany(c => c).GetEnumerator();
    }

    IEnumerator System.Collections.IEnumerable.GetEnumerator() { return GetEnumerator(); }
}

This one is statically allocated, but it's not too hard to make one that grows to fit demand. Just make sure that the block size you specify isn't completely out of range. I've gone with a calculation based on the item size just in case.

这篇关于与“非常非常"一起工作大数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆