过滤和合并许多大型CSV文件 [英] Filtering and Merging Many Large CSV Files

查看:97
本文介绍了过滤和合并许多大型CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试过滤并合并300+ 50,000kb(500k行)的csv文件,然后将它们输出到另一个csv文件中.过滤是基于列中的一个或多个值完成的.我试图找到几个不同的示例,但没有一个涵盖过滤,合并/追加和不将数据保留在内存中的内容.

I am trying to filter and merge 300+ 50,000kb(500k lines) csv files and then output them into another csv file. The filtering is done based on one or more of the values in the columns. I've tried to find a couple different examples but nothing that covers filtering, merging/appending, and NOT keeping the data in memory.

例如,我想合并INV_ITEM_ID 8010的所有记录.

for example i would want to merge all records for INV_ITEM_ID 8010.

所有CSV文件的格式都相同,因此需要以相同的方式进行过滤.

All the CSV files are in the same format and would need to be filtered the same way.

 RUN_DATE   |FORECAST_SET   |INV_ITEM_ID    |FORECAST_DATE  |FORECAST_QTY
 ------------------------------------------------------------------------
 26-Mar-15  |A              |4162           |11/19/2016     | 100
 26-Mar-15  |A              |8010           |11/19/2016     | 100
 26-Mar-15  |A              |4162           |11/19/2016     | 100
 26-Mar-15  |B              |4162           |11/19/2016     | 100
 26-Mar-15  |B              |4162           |11/19/2016     | 100
 26-Mar-15  |B              |8010           |11/19/2016     | 100
 26-Mar-15  |B              |4162           |11/19/2016     | 100
 26-Mar-15  |B              |8010           |11/19/2016     | 100

推荐答案

从性能的角度来看,您可能希望避免使用Import-Csv/Export-Csv并使用 StreamWriter 方法.像这样:

From a performance point of view you probably want to avoid Import-Csv/Export-Csv and go with a StreamReader/StreamWriter approach. Something like this:

$inputFolder = 'C:\some\folder'
$outputFile  = 'C:\path\to\output.csv'

$writer = New-Object IO.StreamWriter ($outputFile, $false)

Get-ChildItem $inputFolder -File | Where-Object {
  ...  # <-- filtering criteria for selecting input files go here
} | ForEach-Object {
  $reader = New-Object IO.StreamReader ($_.FullName)
  if (-not $headerWritten) {
    # copy header line to output file once
    $writer.WriteLine($reader.ReadLine())
    $headerWritten = $true
  } else {
    # discard header line
    $reader.ReadLine()
  }

  while ($reader.Peek() -ge 0) {
    $line   = $reader.ReadLine()
    $fields = $line -split ','
    if (...) {  # <-- filtering criteria for selecting output lines go here
      $writer.WriteLine($line)
    }
  }

  $reader.Close()
  $reader.Dispose()
}

$writer.Close()
$writer.Dispose()

这篇关于过滤和合并许多大型CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆