如何从磁盘到数据库读取大文件,而不会耗尽内存 [英] How do I read a large file from disk to database without running out of memory

查看:223
本文介绍了如何从磁盘到数据库读取大文件,而不会耗尽内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我觉得尴尬的问这个问题,因为我觉得我应该知道。然而,给定我不....我想知道如何从磁盘读取大文件到数据库,而不会得到OutOfMemory异常。具体来说,我需要加载CSV(或真正的制表符分隔文件)。

I feel embarrassed to ask this question as I feel like I should already know. However, given I don't....I want to know how to read large files from disk to a database without getting an OutOfMemory exception. Specifically, I need to load CSV (or really tab delimited files).

我正在尝试 CSVReader ,特别是这段代码示例,但我相信我做错了。其中的一些其他编码示例显示了如何读取任何大小,这是我想要的(只是我需要从磁盘读取),但我不知道什么类型的 IDataReader 我可以创建允许这样做。

I am experimenting with CSVReader and specifically this code sample but I'm sure I'm doing it wrong. Some of their other coding samples show how you can read streaming files of any size, which is pretty much what I want (only I need to read from disk), but I don't know what type of IDataReader I could create to allow this.

我正在直接从磁盘读取,我试图确保我不会因为一次读取太多数据而耗尽内存。我不禁想到我应该能够使用 BufferedFileReader 或类似的东西,我可以指向文件的位置,指定一个缓冲区大小,然后 CsvDataReader 期望一个 IDataReader 作为它的第一个参数,它可以使用它。请告诉我我的方法的错误,让我摆脱我的 GetData 方法与它的任意文件分块机制,并帮助我解决这个基本问题。

I am reading directly from disk and my attempt to ensure I don't ever run out of memory by reading too much data at once is below. I can't help thinking that I should be able to use a BufferedFileReader or something similar where I can point to the location of the file and specify a buffer size and then CsvDataReader expects an IDataReader as it's first parameter, it could just use that. Please show me the error of my ways, let me be rid of my GetData method with it's arbitrary file chunking mechanism and help me out with this basic problem.

    private void button3_Click(object sender, EventArgs e)
    {   
        totalNumberOfLinesInFile = GetNumberOfRecordsInFile();
        totalNumberOfLinesProcessed = 0; 

        while (totalNumberOfLinesProcessed < totalNumberOfLinesInFile)
        {
            TextReader tr = GetData();
            using (CsvDataReader csvData = new CsvDataReader(tr, '\t'))
            {
                csvData.Settings.HasHeaders = false;
                csvData.Settings.SkipEmptyRecords = true;
                csvData.Settings.TrimWhitespace = true;

                for (int i = 0; i < 30; i++) // known number of columns for testing purposes
                {
                    csvData.Columns.Add("varchar");
                }

                using (SqlBulkCopy bulkCopy = new SqlBulkCopy(@"Data Source=XPDEVVM\XPDEV;Initial Catalog=MyTest;Integrated Security=SSPI;"))
                {
                    bulkCopy.DestinationTableName = "work.test";

                    for (int i = 0; i < 30; i++)
                    {
                        bulkCopy.ColumnMappings.Add(i, i); // map First to first_name
                    }

                    bulkCopy.WriteToServer(csvData);

                }
            }
        }
    }

    private TextReader GetData()
    {
        StringBuilder result = new StringBuilder();
        int totalDataLines = 0;
        using (FileStream fs = new FileStream(pathToFile, FileMode.Open, System.IO.FileAccess.Read, FileShare.ReadWrite))
        {
            using (StreamReader sr = new StreamReader(fs))
            {
                string line = string.Empty;
                while ((line = sr.ReadLine()) != null)
                {
                    if (line.StartsWith("D\t"))
                    {
                        totalDataLines++;
                        if (totalDataLines < 100000) // Arbitrary method of restricting how much data is read at once.
                        {
                            result.AppendLine(line);
                        }
                    }
                }
            }
        }
        totalNumberOfLinesProcessed += totalDataLines;
        return new StringReader(result.ToString());
    }


推荐答案

重新寻找,但这是 BULK INSERT 的设计目的。

Probably not the answer you're looking for but this is what BULK INSERT was designed for.

这篇关于如何从磁盘到数据库读取大文件,而不会耗尽内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆