在powershell中连接csv文件,没有第一行(第一个文件除外) [英] Concatenate csv files in powershell, without the first line (except for the first file)

查看:82
本文介绍了在powershell中连接csv文件,没有第一行(第一个文件除外)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个 *.csv 文件.我想在 powershell 脚本中将它们连接成一个 CSV 文件.所有 csv 文件都有相同的标题(第一行),所以当我连接它们时,我只想保留第一个文件的第一行.

I have multiple *.csv files. I want to concatenate them into a single CSV file in a powershell script. All csv files have the same header (the first line), so when I concatenate them I want to keep the first line only from the first file.

我该怎么做?

推荐答案

注意:这个答案中的解决方案有意使用纯文本处理来处理文件,例如两个原因:

Note: The solution in this answer intentionally uses plain-text processing to process the files, for two reasons:

  • 使用 Import-CsvExport-Csv 会产生大量的处理开销(尽管在特定情况下这可能无关紧要);纯文本处理速度明显加快.

  • Use of Import-Csv and Export-Csv incurs significant processing overhead (though that may not matter in a given situation); plain-text processing is significantly faster.

在 Windows PowerShell 和 PowerShell [Core] 6.x 中,输出总是具有双引号列值,即使它们最初不是(尽管这通常无关紧要).

In Windows PowerShell and PowerShell [Core] 6.x, the output will invariably have double-quoted column values, even if they weren't initially (though that should normally not matter).

  • 在 PowerShell [Core] 7.0+ 中 Export-CsvConvertTo-Csv 现在有一个 -UseQuotes 参数,允许您控制引用在输出中.
  • In PowerShell [Core] 7.0+ Export-Csv and ConvertTo-Csv now have a -UseQuotes parameter that allows you to control quoting in the output.

也就是说,Import-CsvExport-Csv 当然是更好的选择,只要您需要阅读和解释数据(而不是仅仅将其复制到其他地方)- 请参阅Sid 的有用回答.

That said, Import-Csv and Export-Csv are certainly the better choice whenever you need to read and interpret the data (as opposed to just copying it elsewhere) - see Sid's helpful answer.

# The single output file.
# Note: Best to save this in a different folder than the input
#       folder, in case you need to run multiple times.
$outFile = 'outdir/out.csv'

# Get all input CSV files as an array of file-info objects,
# from the current dir. in this example
$inFiles = @(Get-ChildItem -Filter *.csv)

# Extract the header line (column names) from the first input file
# and write it to the output file.
Get-Content $inFiles[0] -First 1 | Set-Content -Encoding Utf8 $outFile

# Process all input files and append their *data* rows to the
# output file (that is, skip the header row).
# NOTE: If you only wanted to extract a given count $count of data rows
#       from each file, add -First ($count+1) to the Get-Content call.
foreach ($file in $inFiles) {
  Get-Content $_.FullName | Select-Object -Skip 1 | 
    Set-Content -Append -Encoding Utf8 $outFile 
}

注意以-Encoding Utf8为例;根据需要调整;默认情况下,Set-Content 在 Windows PowerShell 中将使用ANSI"编码,在 PowerShell Core 中将使用 BOM-less UTF-8.

Note the use of -Encoding Utf8 as an example; adjust as needed; by default, Set-Content will use "ANSI" encoding in Windows PowerShell, and BOM-less UTF-8 in PowerShell Core.

警告:通过逐行纯文本处理,您依赖于每个文本行代表一个 CSV 数据行;这是通常正确的,但不一定是.

Caveat: By doing line-by-line plain-text processing, you're relying on each text line representing a single CSV data row; this is typically true, but doesn't have to be.

相反,如果性能至上,上面的纯文本方法可以通过直接使用 .NET 方法(例如 [IO.File]::ReadLines() 或者,如果文件足够小,甚至 [IO.File]::ReadAllLines().

Conversely, if performance is paramount, the plain-text approach above could be made significantly faster with direct use of .NET methods such as [IO.File]::ReadLines() or, if the files are small enough, even [IO.File]::ReadAllLines().

这篇关于在powershell中连接csv文件,没有第一行(第一个文件除外)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆