我在 C# 中需要非常大的数组长度(大小) [英] I need very big array length(size) in C#

查看:41
本文介绍了我在 C# 中需要非常大的数组长度(大小)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

public double[] result = new double[ ??? ];

我正在存储结果,结果总数大于最大 int32 的 2,147,483,647.

I am storing results and total number of the results are bigger than the 2,147,483,647 which is max int32.

我尝试过 biginteger、ulong 等,但它们都给了我错误.

I tried biginteger, ulong etc. but all of them gave me errors.

如何扩展可以在其中存储 > 50,147,483,647 个结果(双倍)的数组的大小?

How can I extend the size of the array that can store > 50,147,483,647 results (double) inside it?

谢谢...

推荐答案

一个由 2,147,483,648 个 double 组成的数组将占用 16GB 的内存.对于某些人来说,这没什么大不了的.如果我分配了一些这样的数组,我的服务器甚至不会打扰页面文件.并不意味着这是个好主意.

An array of 2,147,483,648 doubles will occupy 16GB of memory. For some people, that's not a big deal. I've got servers that won't even bother to hit the page file if I allocate a few of those arrays. Doesn't mean it's a good idea.

当您处理类似的大量数据时,您应该寻求最小化进程对内存的影响.有几种方法可以解决这个问题,具体取决于您处理数据的方式.

When you are dealing with huge amounts of data like that you should be looking to minimize the memory impact of the process. There are several ways to go with this, depending on how you're working with the data.

如果您的数组稀疏填充 - 大量默认/空值和一小部分实际有效/有用的数据 - 那么稀疏数组可以大大减少内存需求.您可以编写各种实现来针对不同的分布配置文件进行优化:随机分布、分组值、任意连续组等.

If your array is sparsely populated - lots of default/empty values with a small percentage of actually valid/useful data - then a sparse array can drastically reduce the memory requirements. You can write various implementations to optimize for different distribution profiles: random distribution, grouped values, arbitrary contiguous groups, etc.

适用于任何类型的包含数据,包括复杂的类.有一些开销,因此当填充百分比很高时,实际上可能比裸数组更糟糕.当然,您仍将使用内存来存储实际数据.

Works fine for any type of contained data, including complex classes. Has some overheads, so can actually be worse than naked arrays when the fill percentage is high. And of course you're still going to be using memory to store your actual data.

将数据存储在磁盘上,为文件创建一个读/写FileStream,并将其封装在一个包装器中,这样您就可以像访问内存中的数组一样访问文件的内容.最简单的实现将为您从文件中顺序读取提供合理的用处.随机读取和写入会减慢您的速度,但您可以在后台进行一些缓冲以帮助缓解速度问题.

Store the data on disk, create a read/write FileStream for the file, and enclose that in a wrapper that lets you access the file's contents as if it were an in-memory array. The simplest implementation of this will give you reasonable usefulness for sequential reads from the file. Random reads and writes can slow you down, but you can do some buffering in the background to help mitigate the speed issues.

这种方法适用于任何具有静态大小的类型,包括可以复制到/从文件中的字节范围复制的结构.不适用于字符串等动态大小的数据.

This approach works for any type that has a static size, including structures that can be copied to/from a range of bytes in the file. Doesn't work for dynamically-sized data like strings.

如果您需要处理动态大小的记录、稀疏数据等,那么您可以设计一种可以优雅地处理它的文件格式.再说一次,此时数据库可能是更好的选择.

If you need to handle dynamic-size records, sparse data, etc. then you might be able to design a file format that can handle it elegantly. Then again, a database is probably a better option at this point.

与其他文件选项相同,但使用不同的机制来访问数据.请参阅System.IO.MemoryMappedFile 有关如何使用 .NET 中的内存映射文件的更多信息.

Same as the other file options, but using a different mechanism to access the data. See System.IO.MemoryMappedFile for more information on how to use Memory Mapped Files from .NET.

根据数据的性质,将其存储在数据库中可能对您有用.然而,对于大量 double 来说,这不太可能是一个很好的选择.在数据库中读/写数据的开销,加上存储开销——每一行至少需要有一个行标识,对于大型记录集可能是一个 BIG_INT(8 字节整数),加倍立即确定数据的大小.加上索引、行存储等的开销,您可以很容易地增加数据的大小.

Depending on the nature of the data, storing it in a database might work for you. For a large array of doubles this is unlikely to be a great option however. The overheads of reading/writing data in the database, plus the storage overheads - each row will at least need to have a row identity, probably a BIG_INT (8-byte integer) for a large recordset, doubling the size of the data right off the bat. Add in the overheads for indexing, row storage, etc. and you can very easily multiply the size of your data.

数据库非常适合存储和操作复杂的数据.这就是他们的目的.如果您有可变宽度的数据——字符串之类的——那么数据库可能是您最好的选择之一.另一方面,它们通常不是处理大量非常简单数据的最佳解决方案.

Databases are great for storing and manipulating complicated data. That's what they're for. If you have variable-width data - strings and the like - then a database is probably one of your best options. The flip-side is that they're generally not an optimal solution for working with large amounts of very simple data.

无论您选择哪种选项,您都可以创建一个 IList 兼容类来封装您的数据.这让您编写的代码无需知道数据是如何存储的,只需知道它是什么.

Whichever option you go with, you can create an IList<T>-compatible class that encapsulates your data. This lets you write code that doesn't have any need to know how the data is stored, only what it is.

这篇关于我在 C# 中需要非常大的数组长度(大小)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆