Windows Powershell-按行号删除一行 [英] Windows Powershell - delete a line by line number

查看:587
本文介绍了Windows Powershell-按行号删除一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的csv文件(1.6gb).如何删除特定的行,例如1005行?

I have a large csv file (1.6gb). how can I delete a specific line e.g. line 1005?

推荐答案

注意:以下解决方案通过行号从任何基于文本的文件中消除了一行.正如 marsze 指出的那样, CSV 文件可能有其他注意事项,请务必注意消除标题行,并且如果行中的值带有嵌入的换行符,则行可以跨越多行;在这种情况下,使用CSV解析器是一个更好的选择.

Note: The solutions below eliminate a single line from any text-based file by line number. As marsze points out, additional considerations may apply to CSV files, where care must be taken not to eliminate the header row, and rows may span multiple lines if they have values with embedded newlines; use of a CSV parser is a better choice in that case.

如果性能不是最重要的,这是一种基于内存的基于管道的方法:

If performance isn't paramount, here's a memory-friendly pipeline-based way to do it:

Get-Content file.txt | 
  Where-Object ReadCount -ne 1005 |
    Set-Content -Encoding Utf8 new-file.txt

Get-Content向它输出的每一行添加一个(有点晦涩的名字).ReadCount属性,该属性包含基于1的行号.

Get-Content adds a (somewhat obscurely named) .ReadCount property to each line it outputs, which contains the 1-based line number.

  • 请注意,Get-Content不会保留输入文件的字符编码,因此,应使用UTF-8作为示例,显式地控制Set-Content的st输出编码,如上所示.

  • Note that the input file's character encoding isn't preserved by Get-Content, so you should control Set-Content'st output encoding explicitly, as shown above, using UTF-8 as an example.

在不将整个文件作为整体读取到内存的情况下,必须至少临时地输出到 new 文件;您可以使用
用临时输出文件替换原始文件 Move-Item -Force new-file.txt file.txt

Without reading the whole file into memory as a whole, you must output to a new file, at least temporarily; you can replace the original file with the temporary output file with
Move-Item -Force new-file.txt file.txt

基于直接使用.NET框架的更快但占用大量内存的替代方法,它还允许您就地更新文件:

A faster, but memory-intensive alternative based on direct use of the .NET framework, which also allows you to update the file in place:

$file = 'file.txt'
$lines = [IO.File]::ReadAllLines("$PWD/$file")
Set-Content -Encoding UTF8 $file -Value $lines[0..1003 + 1005..($lines.Count-1)]

  • 请注意需要使用"$PWD/$file",即,将当前目录路径显式添加到$file中存储的相对路径之前,因为.NET框架对当前目录的概念与PowerShell有所不同.

    • Note the need to use "$PWD/$file", i.e., to explicitly prepend the current directory path to the relative path stored in $file, because the .NET framework's idea of what the current directory is differs from PowerShell's.

      • 虽然$lines = Get-Content $file在功能上等效于$lines = [IO.File]::ReadAllLines("$PWD/$file"),但其性能却明显较差.
      • While $lines = Get-Content $file would be functionally equivalent to $lines = [IO.File]::ReadAllLines("$PWD/$file"), it would perform noticeably poorer.

      0..1003创建一个从01003的索引数组; +通过输入数组的其余部分将索引为1005的数组连接起来;请注意,数组索引是基于0的,而行号则是基于1的.

      0..1003 creates an array of indices from 0 to 1003; + concatenates that array with indices 1005 through the rest of the input array; note that array indices are 0-based, whereas line numbers are 1-based.

      还请注意如何通过-Value将结果数组作为直接参数传递给Set-Content,这比通过管道(... | Set-Content ...)传递数组要快,其中将逐个元素进行处理.

      Also note how the resulting array is passed to Set-Content as a direct argument via -Value, which is faster than passing it via the pipeline (... | Set-Content ...), where element-by-element processing would be performed.

      最后,一种比基于管道的方法快的内存友好方法:

      $file = 'file.txt'
      $outFile = [IO.File]::CreateText("$PWD/new-file.txt")
      $lineNo = 0
      try {
        foreach ($line in [IO.File]::ReadLines("$PWD/$file")) {
          if (++$lineNo -eq 1005) { continue }
          $outFile.WriteLine($line)
        }
      } finally {
        $outFile.Dispose()
      }
      

      与基于管道的命令一样,之后您可能必须用新文件替换原始文件.

      As with the pipeline-based command, you may have to replace the original file with the new file afterwards.

      这篇关于Windows Powershell-按行号删除一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆