允许字符串大于 20 亿个字符的 C# StringBuilder 版本 [英] Version of C# StringBuilder to allow for strings larger than 2 billion characters

查看:40
本文介绍了允许字符串大于 20 亿个字符的 C# StringBuilder 版本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 C# 中,64 位 Windows + .NET 4.5(或更高版本) + 在 App.config 文件中启用 gcAllowVeryLargeObjects 允许大于 2 GB 的对象.这很酷,但不幸的是,C# 允许在字符数组中的最大元素数仍然是 限制为大约 2^31 = 21.5 亿个字符.测试证实了这一点.

In C#, 64bit Windows + .NET 4.5 (or later) + enabling gcAllowVeryLargeObjects in the App.config file allows for objects larger than two gigabyte. That's cool, but unfortunately, the maximum number of elements that C# allows in a character array is still limited to about 2^31 = 2.15 billion chars. Testing confirmed this.

为了克服这个问题,微软 建议在选项 B 中本地创建数组(他们的选项 C"甚至无法编译).这很适合我,因为速度也是一个问题.是否有一些用于 .NET 的久经考验且值得信赖的不安全/本机/互操作/PInvoke 代码可以替代并充当增强型 StringBuilder 以绕过 20 亿个元素的限制?

To overcome this, Microsoft recommends in Option B creating the arrays natively (their 'Option C' doesn't even compile). That suits me, as speed is also a concern. Is there some tried and trusted unsafe / native / interop / PInvoke code for .NET out there that can replace and act as an enhanced StringBuilder to get around the 2 billion element limit?

不安全/pinvoke 代码是首选,但不是交易破坏者.或者,是否有可用的 .NET(安全)版本?

Unsafe/pinvoke code is preferred, but not a deal breaker. Alternatively, is there a .NET (safe) version available?

理想情况下,StringBuilder 替换将从较小的开始(最好是用户定义的),然后在每次超出容量时重复加倍.我主要在这里寻找 append() 功能.将字符串保存到文件中也很有用,但我确信如果 substring() 功能也包含在内,我可以对该位进行编程.如果代码使用pinvoke,那么显然必须考虑一定程度的内存管理以避免内存丢失.

Ideally, the StringBuilder replacement will start off small (preferably user-defined), and then repeatedly double in size each time the capacity has been exceeded. I'm mostly looking for append() functionality here. Saving the string to a file would be useful too, though I'm sure I could program that bit if substring() functionality is also incorporated. If the code uses pinvoke, then obviously some degree of memory management must be taken into account to avoid memory loss.

如果一些简单的代码已经存在,我不想重新创建轮子,但另一方面,我不想为了这个简单的功能而下载和合并一个 DLL.

I don't want to recreate the wheel if some simple code already exists, but on the other hand, I don't want to download and incorporate a DLL just for this simple functionality.

我还使用 .NET 3.5 来满足没有安装最新版本 Windows 的用户的需求.

I'm also using .NET 3.5 to cater for users who don't have the latest version of Windows.

推荐答案

所以我最终创建了自己的 BigStringBuilder 函数.它是一个列表,其中每个列表元素(或页面)都是一个字符数组(类型 List).

So I ended up creating my own BigStringBuilder function in the end. It's a list where each list element (or page) is a char array (type List<char[]>).

如果您使用的是 64 位 Windows,您现在可以轻松超过 20 亿个字符元素的限制.我设法测试创建了一个大约 32 GB 的巨大字符串(需要首先增加操作系统中的虚拟内存,否则我只能在我的 8GB RAM PC 上获得大约 7GB).我确信它可以轻松处理超过 32GB 的容量.理论上,它应该能够处理大约 1,000,000,000 * 1,000,000,000 个字符或 50 亿个字符,这对任何人来说都足够了.

Providing you're using 64 bit Windows, you can now easily surpass the 2 billion character element limit. I managed to test creating a giant string around 32 gigabytes large (needed to increase virtual memory in the OS first, otherwise I could only get around 7GB on my 8GB RAM PC). I'm sure it handles more than 32GB easily. In theory, it should be able to handle around 1,000,000,000 * 1,000,000,000 chars or one quintillion characters, which should be enough for anyone.

速度方面,一些快速测试表明它在追加时仅比 StringBuilder 慢 33% 左右.如果我使用 2D 锯齿状字符数组 (char[][]) 而不是 List,我得到了非常相似的性能,但列表更简单一起工作,所以我坚持了下来.

Speed-wise, some quick tests show that it's only around 33% slower than a StringBuilder when appending. I got very similar performance if I went for a 2D jagged char array (char[][]) instead of List<char[]>, but Lists are simpler to work with, so I stuck with that.

希望其他人觉得它有用!可能存在错误,因此请谨慎使用.不过我测试的还不错.

Hope somebody else finds it useful! There may be bugs, so use with caution. I tested it fairly well though.

// A simplified version specially for StackOverflow
public class BigStringBuilder
{
    List<char[]> c = new List<char[]>();
    private int pagedepth;
    private long pagesize;
    private long mpagesize;         // https://stackoverflow.com/questions/11040646/faster-modulus-in-c-c
    private int currentPage = 0;
    private int currentPosInPage = 0;

    public BigStringBuilder(int pagedepth = 12) {   // pagesize is 2^pagedepth (since must be a power of 2 for a fast indexer)
        this.pagedepth = pagedepth;
        pagesize = (long)Math.Pow(2, pagedepth);
        mpagesize = pagesize - 1;
        c.Add(new char[pagesize]);
    }

    // Indexer for this class, so you can use convenient square bracket indexing to address char elements within the array!!
    public char this[long n]    {
        get { return c[(int)(n >> pagedepth)][n & mpagesize]; }
        set { c[(int)(n >> pagedepth)][n & mpagesize] = value; }
    }

    public string[] returnPagesForTestingPurposes() {
        string[] s = new string[currentPage + 1];
        for (int i = 0; i < currentPage + 1; i++) s[i] = new string(c[i]);
        return s;
    }
    public void clear() {
        c = new List<char[]>();
        c.Add(new char[pagesize]);
        currentPage = 0;
        currentPosInPage = 0;
    }


    public void fileOpen(string path)
    {
        clear();
        StreamReader sw = new StreamReader(path);
        int len = 0;
        while ((len = sw.ReadBlock(c[currentPage], 0, (int)pagesize)) != 0) {
            if (!sw.EndOfStream)    {
                currentPage++;
                if (currentPage > (c.Count - 1)) c.Add(new char[pagesize]);
            }
            else    {
                currentPosInPage = len;
                break;
            }
        }
        sw.Close();
    }

    // See: https://stackoverflow.com/questions/373365/how-do-i-write-out-a-text-file-in-c-sharp-with-a-code-page-other-than-utf-8/373372
    public void fileSave(string path)   {
        StreamWriter sw = File.CreateText(path);
        for (int i = 0; i < currentPage; i++) sw.Write(new string(c[i]));
        sw.Write(new string(c[currentPage], 0, currentPosInPage));
        sw.Close();
    }

    public long length()    {
        return (long)currentPage * (long)pagesize + (long)currentPosInPage;
    }

    public string ToString(long max = 2000000000)   {
        if (length() < max) return substring(0, length());
        else return substring(0, max);
    }

    public string substring(long x, long y) {
        StringBuilder sb = new StringBuilder();
        for (long n = x; n < y; n++) sb.Append(c[(int)(n >> pagedepth)][n & mpagesize]);    //8s
        return sb.ToString();
    }

    public bool match(string find, long start = 0)  {
        //if (s.Length > length()) return false;
        for (int i = 0; i < find.Length; i++) if (i + start == find.Length || this[start + i] != find[i]) return false;
        return true;
    }
    public void replace(string s, long pos) {
        for (int i = 0; i < s.Length; i++)  {
            c[(int)(pos >> pagedepth)][pos & mpagesize] = s[i];
            pos++;
        }
    }

    public void Append(string s)
    {
        for (int i = 0; i < s.Length; i++)
        {
            c[currentPage][currentPosInPage] = s[i];
            currentPosInPage++;
            if (currentPosInPage == pagesize)
            {
                currentPosInPage = 0;
                currentPage++;
                if (currentPage == c.Count) c.Add(new char[pagesize]);
            }
        }
    }


}

这篇关于允许字符串大于 20 亿个字符的 C# StringBuilder 版本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆