迭代Windows Ascii文本文件,找到{LINE2 1-9999}的所有实例替换为{LINE2“代码在其上的行号"}.覆盖.快点? [英] Iterate a windows ascii text file, find all instances of {LINE2 1-9999} replace with {LINE2 "line number the code is on"}. Overwrite. Faster?

查看:101
本文介绍了迭代Windows Ascii文本文件,找到{LINE2 1-9999}的所有实例替换为{LINE2“代码在其上的行号"}.覆盖.快点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此代码有效.我只想看看某人可以使其工作更快.

This code works. I just want to see how much faster someone can make it work.

备份Windows 10批处理文件,以防出现问题.查找字符串{LINE2 1-9999}的所有实例,并替换为{LINE2代码所在的行号"}}.覆盖,以ASCII编码.

Backup your Windows 10 batch file in case something goes wrong. Find all instances of string {LINE2 1-9999} and replace with {LINE2 "line number the code is on"}. Overwrite, encoding as ASCII.

如果_61.bat是:

If _61.bat is:

TITLE %TIME%   NO "%zmyapps1%\*.*" ARCHIVE ATTRIBUTE   LINE2 1243
TITLE %TIME%   DOC/SET YQJ8   LINE2 1887
SET ztitle=%TIME%: WINFOLD   LINE2 2557
TITLE %TIME%   _*.* IN WINFOLD   LINE2 2597
TITLE %TIME%   %%ZDATE1%% YQJ25   LINE2 3672
TITLE %TIME%   FINISHED. PRESS ANY KEY TO SHUTDOWN ... LINE2 4922

结果:

TITLE %TIME%   NO "%zmyapps1%\*.*" ARCHIVE ATTRIBUTE   LINE2 1
TITLE %TIME%   DOC/SET YQJ8   LINE2 2
SET ztitle=%TIME%: WINFOLD   LINE2 3
TITLE %TIME%   _*.* IN WINFOLD   LINE2 4
TITLE %TIME%   %%ZDATE1%% YQJ25   LINE2 5
TITLE %TIME%   FINISHED. PRESS ANY KEY TO SHUTDOWN ... LINE2 6

代码:

Copy-Item $env:windir\_61.bat -d $env:temp\_61.bat
(gc $env:windir\_61.bat) | foreach -Begin {$lc = 1} -Process {
    $_ -replace "LINE2 \d*", "LINE2 $lc";
    $lc += 1
} | Out-File -Encoding Ascii $env:windir\_61.bat

我希望这将花费不到984毫秒的时间.耗时984毫秒.您能想到什么来加快速度吗?

I expect this to take less than 984 milliseconds. It takes 984 milliseconds. Can you think of anything to speed it up?

推荐答案

在PowerShell代码中提高性能的关键(缺少嵌入按需使用Add-Type编译的C#代码的可能)没有帮助)是:

The key to better performance in PowerShell code (short of embedding C# code compiled on demand with Add-Type, which may or may not help) is to:

  • 避免一般使用cmdlet和管道
    • 尤其是为每个管道输入对象调用脚本块({...}),例如使用ForEach-Object.
    • avoid use of cmdlets and the pipeline in general,
      • especially invocation of a script block ({...}) for each pipeline input object, such as with ForEach-Object.

      要明确:管道和cmdlet具有明显的好处,因此只有在必须优化性能的情况下才应避免使用它们.

      在您的情况下,以下代码结合了 switch语句似乎提供了最佳性能-请注意,输入文件已读入内存作为一个整体,作为一个行数组,并在将其写回到输入文件之前创建该数组的副本,其中包含修改后的行:

      In your case, the following code, which combines the switch statement with direct use of the .NET framework for file I/O seems to offer the best performance - note that the input file is read into memory as a whole, as an array of lines, and a copy of that array with the modified lines is created before it is written back to the input file:

      $file = "$env:temp\_61.bat" # must be a *full* path.
      $lc = 0
      $updatedLines = & { switch -Regex -File $file {
        '^(.*? LINE2 )\d+(.*)$' { $Matches[1] + ++$lc + $Matches[2] }
        default { ++$lc; $_ } # pass non-matching lines through
      } }
      [IO.File]::WriteAllLines($file, $updatedLines, [Text.Encoding]::ASCII)
      

      注意:

      • & { ... }中包含switch语句是一种模糊的性能优化,在此答案中进行了解释.

      • Enclosing the switch statement in & { ... } is an obscure performance optimization explained in this answer.

      如果区分大小写 足以满足要求(如示例输入所示),则可以通过将-CaseSensitive选项添加到来进一步提高性能 switch命令.

      If case-sensitive matching is sufficient, as suggested by the sample input, you can improve performance a little more by adding the -CaseSensitive option to the switch command.

      在我的测试中(见下文),相对于您的命令,这在Windows PowerShell中提供了超过4倍的性能改进.

      In my tests (see below), this provided a more than 4-fold performance improvement in Windows PowerShell relative to your command.

      这是通过 Time-Command函数进行的性能比较. :

      比较的命令是:

      • 上面的switch命令.

      您自己的命令的简化版本.

      A slightly streamlined version of your own command.

      PowerShell Core v6.1 +替代品,它使用-replace运算符,并以行数组作为LHS,并以 scriptblock 作为替换表达式.

      A PowerShell Core v6.1+ alternative that uses the -replace operator with the array of lines as the LHS and a scriptblock as the replacement expression.

      使用了6,000行文件而不是6行示例文件. 平均运行100次. 调整这些参数很容易.

      Instead of a 6-line sample file, a 6,000-line file is used. 100 runs are being averaged. It's easy to adjust these parameters.

      # Sample file content (6 lines)
      $fileContent = @'
      TITLE %TIME%   NO "%zmyapps1%\*.*" ARCHIVE ATTRIBUTE   LINE2 1243
      TITLE %TIME%   DOC/SET YQJ8   LINE2 1887
      SET ztitle=%TIME%: WINFOLD   LINE2 2557
      TITLE %TIME%   _*.* IN WINFOLD   LINE2 2597
      TITLE %TIME%   %%ZDATE1%% YQJ25   LINE2 3672
      TITLE %TIME%   FINISHED. PRESS ANY KEY TO SHUTDOWN ... LINE2 4922
      
      '@
      
      # Determine the full path to a sample file.
      # NOTE: Using the *full* path is a *must* when calling .NET methods, because
      #       the latter generally don't see the same working dir. as PowerShell.
      $file = "$PWD/test.bat"
      
      # Create the sample file with the sample content repeated N times.
      $repeatCount = 1000 # -> 6,000 lines
      [IO.File]::WriteAllText($file, $fileContent * $repeatCount)
      
      # Warm up the file cache and count the lines.
      $lineCount = [IO.File]::ReadAllLines($file).Count
      
      # Define the commands to compare as an array of scriptblocks.
      $commands =
        { # switch -Regex -File + [IO.File]::Read/WriteAllLines()
          $i = 0
          $updatedLines = & { switch -Regex -File $file {
            '^(.*? LINE2 )\d+(.*)$' { $Matches[1] + ++$i + $Matches[2] }
            default { ++$lc; $_ }
          } }
         [IO.File]::WriteAllLines($file, $updatedLines, [text.encoding]::ASCII)
        },
        { # Get-Content + -replace + Set-Content
          (Get-Content $file) | ForEach-Object -Begin { $i = 1 } -Process {
            $_ -replace "LINE2 \d*", "LINE2 $i"
            ++$i
          } | Set-Content -Encoding Ascii $file
        }
      
      # In PS Core v6.1+, also test -replace with a scriptblock operand.
      if ($PSVersionTable.PSVersion.Major -ge 6 -and $PSVersionTable.PSVersion.Minor -ge 1) {
        $commands +=
          { # -replace with scriptblock + [IO.File]::Read/WriteAllLines()
            $i = 0
            [IO.File]::WriteAllLines($file,
              ([IO.File]::ReadAllLines($file) -replace '(?<= LINE2 )\d+', { (++$i) }),
              [text.encoding]::ASCII
            )
          }
      } else {
        Write-Warning "Skipping -replace-with-scriptblock command, because it isn't supported in this PS version."
      }
      
      # How many runs to average.
      $runs = 100
      
      Write-Verbose -vb "Averaging $runs runs with a $lineCount-line file of size $('{0:N2} MB' -f ((Get-Item $file).Length / 1mb))..."
      
      Time-Command -Count $runs -ScriptBlock $commands
      

      以下是我的Windows 10计算机的示例结果(绝对时间并不重要,但希望Factor列中显示的相对性能在某种程度上具有代表性);使用的PowerShell Core 版本是v6.2.0-preview.4

      Here are sample results from my Windows 10 machine (the absolute timings aren't important, but hopefully the relative performance show in in the Factor column is somewhat representative); the PowerShell Core version used is v6.2.0-preview.4

      # Windows 10, Windows PowerShell v5.1
      
      WARNING: Skipping -replace-with-scriptblock command, because it isn't supported in this PS version.
      VERBOSE: Averaging 100 runs with a 6000-line file of size 0.29 MB...
      
      Factor Secs (100-run avg.) Command
      ------ ------------------- -------
      1.00   0.108               # switch -Regex -File + [IO.File]::Read/WriteAllLines()...
      4.22   0.455               # Get-Content + -replace + Set-Content...
      
      
      # Windows 10, PowerShell Core v6.2.0-preview 4
      
      VERBOSE: Averaging 100 runs with a 6000-line file of size 0.29 MB...
      
      Factor Secs (100-run avg.) Command
      ------ ------------------- -------
      1.00   0.101               # switch -Regex -File + [IO.File]::Read/WriteAllLines()…
      1.67   0.169               # -replace with scriptblock + [IO.File]::Read/WriteAllLines()…
      4.98   0.503               # Get-Content + -replace + Set-Content…
      
      

      这篇关于迭代Windows Ascii文本文件,找到{LINE2 1-9999}的所有实例替换为{LINE2“代码在其上的行号"}.覆盖.快点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆