如何有效地大文件分割 [英] How to split large files efficiently

查看:167
本文介绍了如何有效地大文件分割的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道我怎么可以拆分较大的文件,而无需使用过多的系统资源。 我目前使用这种code:

I'd like to know how I can split a large file without using too many system resources. I'm currently using this code:

public static void SplitFile(string inputFile, int chunkSize, string path)
{
    byte[] buffer = new byte[chunkSize];

    using (Stream input = File.OpenRead(inputFile))
    {
        int index = 0;
        while (input.Position < input.Length)
        {
            using (Stream output = File.Create(path + "\\" + index))
            {
                int chunkBytesRead = 0;
                while (chunkBytesRead < chunkSize)
                {
                    int bytesRead = input.Read(buffer, 
                                               chunkBytesRead, 
                                               chunkSize - chunkBytesRead);

                    if (bytesRead == 0)
                    {
                        break;
                    }
                    chunkBytesRead += bytesRead;
                }
                output.Write(buffer, 0, chunkBytesRead);
            }
            index++;
        }
    }
}

该操作需要52.370秒到1.6GB的文件分割成14MB的文件。我不关心多久操作需要,我更关心的是作为这个应用程序的系统资源将被部署到一个共享的托管环境。目前,这一操作最大的了我系统的硬盘IO使用率达到100%,减缓我的系统大大下降。 CPU使用率较低; RAM斜了一下,但似乎罚款。

The operation takes 52.370 seconds to split a 1.6GB file into 14mb files. I'm not concerned about how long the operation takes, I'm more concerned about the system resource used as this app will be deployed to a shared hosting environment. Currently this operation max's out my systems HDD IO usage at 100%, and slows my system down considerably. CPU usage is low; RAM ramps up a bit, but seems fine.

有没有一种方法可以让我使用了太多的资源,限制此操作?

Is there a way I can restrict this operation from using too many resources?

感谢

推荐答案

这似乎很奇怪,组装内存中的每个输出文件;我怀疑你应该运行的内部缓冲(也许20K或东西),并调用更频繁。

It seems odd to assemble each output file in memory; I suspect you should be running an inner buffer (maybe 20k or something) and calling Write more frequently.

最后,如果你需要的IO,你需要IO。如果要礼让共享的托管环境中,您可以添加刻意暂停 - 内环内可能短暂停,和更长的停顿(或许为1)的外环。这会不会影响你的整体时间不多,但可以帮助其他进程得到一些IO。

Ultimately, if you need IO, you need IO. If you want to be courteous to a shared hosting environment you could add deliberate pauses - maybe short pauses within the inner loop, and a longer pause (maybe 1s) in the outer loop. This won't affect your overall timing much, but may help other processes get some IO.

有内环的缓冲液中实施例:

Example of a buffer for the inner-loop:

public static void SplitFile(string inputFile, int chunkSize, string path)
{
    const int BUFFER_SIZE = 20 * 1024;
    byte[] buffer = new byte[BUFFER_SIZE];

    using (Stream input = File.OpenRead(inputFile))
    {
        int index = 0;
        while (input.Position < input.Length)
        {
            using (Stream output = File.Create(path + "\\" + index))
            {
                int remaining = chunkSize, bytesRead;
                while (remaining > 0 && (bytesRead = input.Read(buffer, 0,
                        Math.Min(remaining, BUFFER_SIZE))) > 0)
                {
                    output.Write(buffer, 0, bytesRead);
                    remaining -= bytesRead;
                }
            }
            index++;
            Thread.Sleep(500); // experimental; perhaps try it
        }
    }
}

这篇关于如何有效地大文件分割的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆