PowerShell 在大型搜索/替换操作中很慢(比 Python 慢得多)? [英] PowerShell is slow (much slower than Python) in large Search/Replace operation?

查看:53
本文介绍了PowerShell 在大型搜索/替换操作中很慢(比 Python 慢得多)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 265 个 CSV 文件,总记录(行)超过 400 万,需要在所有 CSV 文件中进行搜索和替换.我在下面有一段 PowerShell 代码可以执行此操作,但执行该操作需要 17 分钟:

I have 265 CSV files with over 4 million total records (lines), and need to do a search and replace in all the CSV files. I have a snippet of my PowerShell code below that does this, but it takes 17 minutes to perform the action:

ForEach ($file in Get-ChildItem C:\temp\csv\*.csv) 
{
    $content = Get-Content -path $file
    $content | foreach {$_ -replace $SearchStr, $ReplaceStr} | Set-Content $file
}

现在我有以下 Python 代码,它执行相同的操作,但执行时间不到 1 分钟:

Now I have the following Python code that does the same thing but takes less than 1 minute to perform:

import os, fnmatch

def findReplace(directory, find, replace, filePattern):
    for path, dirs, files in os.walk(os.path.abspath(directory)):
        for filename in fnmatch.filter(files, filePattern):
            filepath = os.path.join(path, filename)
            with open(filepath) as f:
                s = f.read()
            s = s.replace(find, replace)
            with open(filepath, "w") as f:
                f.write(s)

findReplace("c:/temp/csv", "Search String", "Replace String", "*.csv")

为什么 Python 方法效率如此之高?是我的 PowerShell 代码效率低下,还是 Python 在文本操作方面只是一种更强大的编程语言?

Why is the Python method so much more efficient? Is my PowerShell code in-efficient, or is Python just a more powerful programming language when it comes to text manipulation?

推荐答案

试试这个 PowerShell 脚本.它应该表现得更好.由于文件是在缓冲流中读取的,因此对 RAM 的使用也少得多.

Give this PowerShell script a try. It should perform much better. Much less use of RAM too as the file is read in a buffered stream.

$reader = [IO.File]::OpenText("C:\input.csv")
$writer = New-Object System.IO.StreamWriter("C:\output.csv")

while ($reader.Peek() -ge 0) {
    $line = $reader.ReadLine()
    $line2 = $line -replace $SearchStr, $ReplaceStr
    $writer.writeline($line2)
}

$reader.Close()
$writer.Close()

这会处理一个文件,但您可以使用它来测试性能,如果更容易接受,则将其添加到循环中.

This processes one file, but you can test performance with it and if its more acceptable add it to a loop.

或者,您可以使用 Get-Content 将多行读取到内存中,执行替换,然后使用 PowerShell 管道写入更新的块.

Alternatively you can use Get-Content to read a number of lines into memory, perform the replacement and then write the updated chunk utilizing the PowerShell pipeline.

Get-Content "C:\input.csv" -ReadCount 512 | % {
    $_ -replace $SearchStr, $ReplaceStr
} | Set-Content "C:\output.csv"

为了提高性能,您还可以像这样编译正则表达式(-replace 使用正则表达式):

To squeeze a little more performance you can also compile the regex (-replace uses regular expressions) like this:

$re = New-Object Regex $SearchStr, 'Compiled'
$re.Replace( $_ , $ReplaceStr )

这篇关于PowerShell 在大型搜索/替换操作中很慢(比 Python 慢得多)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆