过滤和合并许多大型CSV文件 [英] Filtering and Merging Many Large CSV Files
问题描述
我正在尝试过滤并合并300+ 50,000kb(500k行)的csv文件,然后将它们输出到另一个csv文件中.过滤是基于列中的一个或多个值完成的.我试图找到几个不同的示例,但没有一个涵盖过滤,合并/追加和不将数据保留在内存中的内容.
I am trying to filter and merge 300+ 50,000kb(500k lines) csv files and then output them into another csv file. The filtering is done based on one or more of the values in the columns. I've tried to find a couple different examples but nothing that covers filtering, merging/appending, and NOT keeping the data in memory.
例如,我想合并INV_ITEM_ID 8010的所有记录.
for example i would want to merge all records for INV_ITEM_ID 8010.
所有CSV文件的格式都相同,因此需要以相同的方式进行过滤.
All the CSV files are in the same format and would need to be filtered the same way.
RUN_DATE |FORECAST_SET |INV_ITEM_ID |FORECAST_DATE |FORECAST_QTY
------------------------------------------------------------------------
26-Mar-15 |A |4162 |11/19/2016 | 100
26-Mar-15 |A |8010 |11/19/2016 | 100
26-Mar-15 |A |4162 |11/19/2016 | 100
26-Mar-15 |B |4162 |11/19/2016 | 100
26-Mar-15 |B |4162 |11/19/2016 | 100
26-Mar-15 |B |8010 |11/19/2016 | 100
26-Mar-15 |B |4162 |11/19/2016 | 100
26-Mar-15 |B |8010 |11/19/2016 | 100
推荐答案
从性能的角度来看,您可能希望避免使用Import-Csv
/Export-Csv
并使用 StreamWriter
方法.像这样:
From a performance point of view you probably want to avoid Import-Csv
/Export-Csv
and go with a StreamReader
/StreamWriter
approach. Something like this:
$inputFolder = 'C:\some\folder'
$outputFile = 'C:\path\to\output.csv'
$writer = New-Object IO.StreamWriter ($outputFile, $false)
Get-ChildItem $inputFolder -File | Where-Object {
... # <-- filtering criteria for selecting input files go here
} | ForEach-Object {
$reader = New-Object IO.StreamReader ($_.FullName)
if (-not $headerWritten) {
# copy header line to output file once
$writer.WriteLine($reader.ReadLine())
$headerWritten = $true
} else {
# discard header line
$reader.ReadLine()
}
while ($reader.Peek() -ge 0) {
$line = $reader.ReadLine()
$fields = $line -split ','
if (...) { # <-- filtering criteria for selecting output lines go here
$writer.WriteLine($line)
}
}
$reader.Close()
$reader.Dispose()
}
$writer.Close()
$writer.Dispose()
这篇关于过滤和合并许多大型CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!