PowerShell Get-Content 基本操作如此之慢 [英] PowerShell Get-Content with basic manipulations so slow

查看:53
本文介绍了PowerShell Get-Content 基本操作如此之慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在合并许多大型 CSV 文件,例如同时跳过前导垃圾并将文件名附加到每一行:

I am merging a lot of large CSV files, e.g. while skipping the leading junk and appending the filename to each line:

Get-ChildItem . | Where Name -match "Q[0-4]20[0-1][0-9].csv" | 
Foreach-Object {
    $file = $_.BaseName
    Get-Content $_.FullName | select-object -skip 3 | % {
        "$_,${file}" | Out-File -Append temp.csv -Encoding ASCII
    }
}

在 PowerShell 中,即使在 i7/16GB 机器上(约 5 兆字节/分钟),这也非常慢.我可以让它更有效率还是我应该切换到例如蟒蛇?

In PowerShell this is incredibly slow even on an i7/16GB machine (~5 megabyte/minute). Can I make it more efficient or should I just switch to e.g. Python?

推荐答案

Get-Content/Set-Content 对于较大的文件很糟糕.当性能是关键时,流是一个很好的选择.因此,考虑到这一点,让我们使用一个来读取每个文件,并使用另一个来写出结果.

Get-Content / Set-Content are terrible with larger files. Streams are a good alternative when performance is key. So with that in mind lets use one to read in each file and another to write out the results.

$rootPath = "C:\temp"
$outputPath = "C:\test\somewherenotintemp.csv"
$streamWriter = [System.IO.StreamWriter]$outputPath
Get-ChildItem $rootPath -Filter "*.csv" -File  | ForEach-Object{
    $file = $_.BaseName
    [System.IO.File]::ReadAllLines($_.FullName) | 
        Select-Object -Skip 3 | ForEach-Object{
            $streamWriter.WriteLine(('{0},"{1}"' -f $_,$file))
    }
}
$streamWriter.Close(); $streamWriter.Dispose()

创建一个写入流$streamWriter 来输出每个文件中编辑过的行.我们可以读入文件并批量写入文件,这样会更快,但是我们需要忽略几行并对每一行进行更改,这样逐行处理就更简单了.避免在此期间向控制台写入任何内容,因为它只会减慢一切.

Create a writing stream $streamWriter to output the edited lines in each file. We could read in the file and write the file in larger batches, which would be faster, but we need to ignore a few lines and make changes to each one so processing line by line is simpler. Avoid writing anything to console during this time as it will just slow everything down.

'{0},"{1}"' -f $_,$file 所做的是引用最后一个列",以防基本名称包含空格.

What '{0},"{1}"' -f $_,$file does is quote that last "column" that is added in case the basename contains spaces.

这篇关于PowerShell Get-Content 基本操作如此之慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆