Powershell - 优化非常,非常大的csv和文本文件搜索和替换 [英] Powershell - Optimizing a very, very large csv and text file search and replace

查看:346
本文介绍了Powershell - 优化非常,非常大的csv和文本文件搜索和替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含〜3000个文本文件的目录,我将定期搜索并替换这些文本文件,因为我将程序转换到新的服务器。

I have a directory with ~ 3000 text files in it, and I'm doing periodic search and replaces on those text files as I transition a program to a new server.

每个文本文件可能平均有约3000行,我需要一次搜索文件大约300 - 1000个词条。

Each text file may have an average of ~3000 lines, and I need to search the files for maybe 300 - 1000 terms at a time.

替换与我正在搜索的字符串相关的服务器前缀。所以对于每一个csv条目,我正在寻找 Search_String \\Old_Server\Search_String并确保程序完成后,结果为\\New_Server\Search_String

I'm replacing the server prefix which is related to the string I'm searching for. So for every one of the csv entries, I'm looking for Search_String, \\Old_Server\"Search_String" and making sure that after the program completes, the result is "\\New_Server\Search_String".

我拼凑了一个PowerShell程序,它的工作原理。但它是这么慢我从来没有看到它完成。

I cobbled together a powershell program, and it works. But it's so slow I've never seen it complete.

有什么建议吗?

编辑1:
我将get-建议,但它仍然花了3分钟搜索两个文件(〜8000行)9个单独的搜索项。我还必须拧紧;如果手动执行9次,记事本++搜索和替换仍然会更快。

EDIT 1: I changed get-content as suggested, but it still took 3 minutes to search two files (~8000 lines) for 9 separate search terms. I must still be screwing up; a notepad++ search and replace would still be way faster if done manually 9 times.

我不知道如何摆脱第一个(Get-Content)在对其进行任何更改之前,需要复制该文件以备份。

I'm not sure how to get rid of the first (Get-Content) because I want to make a copy of the file for backup before I make any changes to it.

编辑2:
所以这是一个数量级的速度;它在10秒钟内搜索一个文件。但现在它不写文件的更改,它只搜索目录中的第一个文件!我没有改变代码,所以我不知道为什么它破裂。

EDIT 2: So this is an order of magnitude faster; it's searching a file in maybe 10 seconds. But now it doesn't write changes to files, and it only searches the first file in the directory! I didn't change that code, so I don't know why it broke.

编辑3:
成功!我适应了一个解决方案贴在下面,使它多得多,快得多。它在几秒钟内搜索每个文件。我可以颠倒循环顺序,以便它将文件加载到数组中,然后搜索和替换CSV中的每个条目,而不是相反。

EDIT 3: Success! I adapted a solution posted below to make it much, much faster. It's searching each file in a couple of seconds now. I may reverse the loop order, so that it loads the file into the array and then searches and replaces each entry in the CSV rather than the other way around. I'll post that if I get it to work.

最后的脚本如下。

#get input from the user
$old = Read-Host 'Enter the old cimplicity qualifier (F24, IRF3 etc'
$new = Read-Host 'Enter the new cimplicity qualifier (CB3, F24_2 etc)'
$DirName = Get-Date -format "yyyy_MM_dd_hh_mm"

New-Item -ItemType directory -Path $DirName -force
New-Item "$DirName\log.txt" -ItemType file -force -Value "`nMatched CTX files on $dirname`n"
$logfile = "$DirName\log.txt"

$VerbosePreference = "SilentlyContinue"


$points = import-csv SearchAndReplace.csv -header find #Import CSV File
#$ctxfiles = Get-ChildItem . -include *.ctx | select -expand fullname #Import local directory of CTX Files

$points | foreach-object { #For each row of points in the CSV file
    $findvar = $_.find #Store column 1 as string to search for  

    $OldQualifiedPoint = "\\\\"+$old+"\\" + $findvar #Use escape slashes to escape each invidual bs so it's not read as regex
    $NewQualifiedPoint = "\\"+$new+"\" + $findvar #escape slashes are NOT required on the new string
    $DuplicateNew = "\\\\" + $new + "\\" + "\\\\" + $new + "\\"
    $QualifiedNew = "\\" + $new + "\"

    dir . *.ctx | #Grab all CTX Files 
     select -expand fullname | #grab all of those file names and...
      foreach {#iterate through each file
                $DateTime = Get-Date -Format "hh:mm:ss"
                $FileName = $_
                Write-Host "$DateTime - $FindVar - Checking $FileName"
                $FileCopied = 0
                #Check file contents, and copy matching files to newly created directory
                If (Select-String -Path $_ -Pattern $findvar -Quiet ) {
                   If (!($FileCopied)) {
                        Copy $FileName -Destination $DirName
                        $FileCopied = 1
                        Add-Content $logfile "`n$DateTime - Found $Findvar in $filename"
                        Write-Host "$DateTime - Found $Findvar in $filename"
                    }

                    $FileContent = Get-Content $Filename -ReadCount 0
                    $FileContent =
                    $FileContent -replace $OldQualifiedPoint,$NewQualifiedPoint -replace $findvar,$NewQualifiedPoint -replace $DuplicateNew,$QualifiedNew
                    $FileContent | Set-Content $FileName
                }
           }
         $File.Dispose()
    }       


推荐答案

如果我正确读取,你应该能够读取一个3000行文件到内存,并做这些替换作为数组操作,消除需要遍历每一行。您还可以将这些替换操作链接到单个命令中。

If I'm reading this correctly, you should be able to read a 3000 line file into memory, and do those replaces as an array operation, eliminating the need to iterate through each line. You can also chain those replace operations into a single command.

dir . *.ctx | #Grab all CTX Files 
     select -expand fullname | #grab all of those file names and...
      foreach {#iterate through each file
                $DateTime = Get-Date -Format "hh:mm:ss"
                $FileName = $_
                Write-Host "$DateTime - $FindVar - Checking $FileName"
                #Check file contents, and copy matching files to newly created directory
                If (Select-String -Path $_ -Pattern $findvar -Quiet ) {
                    Copy $FileName -Destination $DirName
                    Add-Content $logfile "`n$DateTime - Found $Findvar in $filename"
                    Write-Host "$DateTime - Found $Findvar in $filename"

                    $FileContent = Get-Content $Filename -ReadCount 0
                    $FileContent =
                      $FileContent -replace $OldQualifiedPoint,$NewQualifiedPoint -replace $findvar,$NewQualifiedPoint -replace $DuplicateNew,$QualifiedNew
                     $FileContent | Set-Content $FileName
                }
           }

String将文件路径作为参数,所以你不必做一个 Get-Content 然后管道到 Select-String

On another note, Select-String will take the filepath as an argument, so you don't have to do a Get-Content and then pipe that to Select-String.

这篇关于Powershell - 优化非常,非常大的csv和文本文件搜索和替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆