与“非常”合作大阵列 [英] Working with "very very" large arrays

查看:74
本文介绍了与“非常”合作大阵列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要处理非常大的小型数组(int或float数组),我仅将X64定位于具有大量内存的机器上,在我的方案中,物理内存从来都不是问题。在查看gcAllowVeryLargeObjects的文档时,我注意到了这一点:



•对于字节数组和单字节结构数组,任何单个维的最大索引为2,147,483,591(0x7FFFFFC7) ,其他类型则为2,146,435,071(0X7FEFFFFF)。



现在,我的问题是我实际上需要使用更大的数组,这里合适的解决方法是什么?创建数组或其他抽象数组?



知道我最需要顺序访问这些数组(从不随机读取,但通常不同的段被不同的线程顺序读取,可能是100 +一次线程)我最好的选择是什么?



我可能需要持有最多65 536000000个元素或更多的数组。

解决方案

如果确实必须打破数组长度限制,则必须将数组拆分为适当大小的块。您可以将这些大块包装在具有适当语义的容器中,例如 BigArrayOfLong 对象。有很多类似的东西。



基本思想是,您使用锯齿状数组来分配要使用的空间。请注意,多维数组不会给您任何好处,因为它仍然是单个对象,而锯齿状数组是较小的数组数组,每个数组都是(可能不是连续的)内存中的自己的对象。 p>

这是一个非常简单(不是特别理想)的实现:

 公共类HugeArray< T> :IEnumerable< T> 
其中T:结构
{
public static int arysize =(Int32.MaxValue>> 4)/ Marshal.SizeOf< T>();

公共只读long容量;
个私有只读T [] []内容;

public T this [long index]
{
get
{
if(index< 0 || index> = Capacity)
抛出新的IndexOutOfRangeException();
int块=(int)(索引/ arysize);
int offset =(int)(索引%arysize);
return content [chunk] [offset];
}
set
{
if(index< 0 || index> = Capacity)
throw new IndexOutOfRangeException();
int块=(int)(索引/ arysize);
int offset =(int)(索引%arysize);
content [chunk] [offset] =值;
}
}

public HugeArray(长容量)
{
容量=容量;
int nChunks =(int)(容量/ arysize);
int nRemainder =(int)(容量百分比尺寸);

if(nRemainder == 0)
content =新的T [nChunks] [];
else
content = new T [nChunks + 1] [];

for(int i = 0; i content [i] = new T [arysize];
if(nRemainder> 0)
content [content.Length-1] = new T [nRemainder];
}

公共IEnumerator< T> GetEnumerator()
{
返回content.SelectMany(c => c).GetEnumerator();
}

IEnumerator System.Collections.IEnumerable.GetEnumerator(){return GetEnumerator(); }
}

这个是静态分配的,但是创建一个增长以适应需求。只需确保指定的块大小不完全超出范围即可。我已经根据商品尺寸进行了计算,以防万一。


I need to work with very large arrays of small types (int or float arrays), i'm targeting X64 only on machines with a lot of ram, physical memory is never the issue in my scenarios. While looking at the doc for gcAllowVeryLargeObjects i noticed this point :

•The maximum index in any single dimension is 2,147,483,591 (0x7FFFFFC7) for byte arrays and arrays of single-byte structures, and 2,146,435,071 (0X7FEFFFFF) for other types.

Now my issue is i actually "need" to work with larger arrays than that, what would be the appropriate workaround here? creating arrays of arrays or other abstractions?

Knowing i mostly need to access those arrays sequencially (never random reads, but often diferent segments getting read sequencially by diferent threads, potentially 100+ threads at once) what would my best bet be?

I may need to hold arrays of up to 65 536 000 000 elements or more.

解决方案

If you really must break the array length limit then you'll have to split the array into chunks of suitable size. You can wrap those chunks together in a container that has the appropriate semantics, like the BigArrayOfLong object that James McCaffrey blogged about a while back. There are numerous others like it.

The basic idea is you use a jagged array to allocate the space you're going to use. Note that a multi-dimensional array won't give you any advantage since it is still a single object, while a jagged array is a smaller array of arrays, each of which is its own object in (probably not contiguous) memory.

Here's a very simple (and not particular optimal) implementation:

public class HugeArray<T> : IEnumerable<T>
    where T : struct
{
    public static int arysize = (Int32.MaxValue >> 4) / Marshal.SizeOf<T>();

    public readonly long Capacity;
    private readonly T[][] content;

    public T this[long index]
    {
        get
        {
            if (index < 0 || index >= Capacity)
                throw new IndexOutOfRangeException();
            int chunk = (int)(index / arysize);
            int offset = (int)(index % arysize);
            return content[chunk][offset];
        }
        set
        {
            if (index < 0 || index >= Capacity)
                throw new IndexOutOfRangeException();
            int chunk = (int)(index / arysize);
            int offset = (int)(index % arysize);
            content[chunk][offset] = value;
        }
    }

    public HugeArray(long capacity)
    {
        Capacity = capacity;
        int nChunks = (int)(capacity / arysize);
        int nRemainder = (int)(capacity % arysize);

        if (nRemainder == 0)
            content = new T[nChunks][];
        else
            content = new T[nChunks + 1][];

        for (int i = 0; i < nChunks; i++)
            content[i] = new T[arysize];
        if (nRemainder > 0)
            content[content.Length - 1] = new T[nRemainder];
    }

    public IEnumerator<T> GetEnumerator()
    {
        return content.SelectMany(c => c).GetEnumerator();
    }

    IEnumerator System.Collections.IEnumerable.GetEnumerator() { return GetEnumerator(); }
}

This one is statically allocated, but it's not too hard to make one that grows to fit demand. Just make sure that the block size you specify isn't completely out of range. I've gone with a calculation based on the item size just in case.

这篇关于与“非常”合作大阵列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆