如何读取一个csv文件一行一次,替换/编辑某些行,你走? [英] How to read a csv file one line at a time and replace/edit certain lines as you go?

查看:2490
本文介绍了如何读取一个csv文件一行一次,替换/编辑某些行,你走?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个60GB的csv文件,我需要做一些修改。客户希望对文件数据进行一些更改,但我不想重新生成该文件中的数据,因为它需要4天时间。

I have a 60GB csv file I need to make some modifications to. The customer wants some changes to the files data, but I don't want to regenerate the data in that file because it took 4 days to do.

我如何阅读文件,逐行(不将其全部加载到内存!),并且在我去,替换某些值等时对这些行进行编辑。

How can I read the file, line by line (not loading it all into memory!), and make edits to those lines as I go, replacing certain values etc.?

推荐答案

过程如下:


  1. 打开 StreamWriter 到临时文件。

  2. 打开 StreamReader 到目标文件。

  3. 对于每一行:

  1. Open a StreamWriter to a temporary file.
  2. Open a StreamReader to the target file.
  3. For each line:

  1. 根据分隔符将文本拆分成列。

  2. 检查值

  3. 使用分隔符将列值加回到一起。

  4. 将行写入临时文件。


  • 完成后,删除目标文件,并将临时文件移动到目标文件路径。

  • 有关步骤2和3.1的注意事项:如果您对文件的结构充满信心,而且足够简单,您可以按照说明完成所有这些(我会在一瞬间包括一个样本)。但是,CSV文件中存在可能需要注意的因素(例如,识别何时在列值中逐字使用分隔符)。您可以自己琢磨,或尝试现有解决方案

    Note regarding Steps 2 and 3.1: If you are confident in the structure of your file and it is simple enough, you can do all this out of the box as described (I'll include a sample in a moment). However, there are factors in a CSV file that may need attention (such as recognizing when a delimiter is being used literally in a column value). You can drudge through this yourself, or try an existing solution.

    基本示例只使用 StreamReader StreamWriter

    var sourcePath = @"C:\data.csv";
    var delimiter = ",";
    var firstLineContainsHeaders = true;
    var tempPath = Path.GetTempFileName();
    var lineNumber = 0;
    
    var splitExpression = new Regex(@"(" + delimiter + @")(?=(?:[^""]|""[^""]*"")*$)");
    
    using (var writer = new StreamWriter(tempPath))
    using (var reader = new StreamReader(sourcePath))
    {
        string line = null;
        string[] headers = null;
        if (firstLineContainsHeaders)
        {
            line = reader.ReadLine();
            lineNumber++;
    
            if (string.IsNullOrEmpty(line)) return; // file is empty;
    
            headers = splitExpression.Split(line).Where(s => s != delimiter).ToArray();
    
            writer.WriteLine(line); // write the original header to the temp file.
        }
    
        while ((line = reader.ReadLine()) != null)
        {
            lineNumber++;
    
            var columns = splitExpression.Split(line).Where(s => s != delimiter).ToArray();
    
            // if there are no headers, do a simple sanity check to make sure you always have the same number of columns in a line
            if (headers == null) headers = new string[columns.Length];
    
            if (columns.Length != headers.Length) throw new InvalidOperationException(string.Format("Line {0} is missing one or more columns.", lineNumber));
    
            // TODO: search and replace in columns
            // example: replace 'v' in the first column with '\/': if (columns[0].Contains("v")) columns[0] = columns[0].Replace("v", @"\/");
    
            writer.WriteLine(string.Join(delimiter, columns));
        }
    
    }
    
    File.Delete(sourcePath);
    File.Move(tempPath, sourcePath);
    

    这篇关于如何读取一个csv文件一行一次,替换/编辑某些行,你走?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆