Windows Powershell-按行号删除一行 [英] Windows Powershell - delete a line by line number
问题描述
我有一个很大的csv文件(1.6gb).如何删除特定的行,例如1005行?
I have a large csv file (1.6gb). how can I delete a specific line e.g. line 1005?
推荐答案
注意:以下解决方案通过行号从任何基于文本的文件中消除了一行.正如 marsze 指出的那样, CSV 文件可能有其他注意事项,请务必注意消除标题行,并且如果行中的值带有嵌入的换行符,则行可以跨越多行;在这种情况下,使用CSV解析器是一个更好的选择.
Note: The solutions below eliminate a single line from any text-based file by line number. As marsze points out, additional considerations may apply to CSV files, where care must be taken not to eliminate the header row, and rows may span multiple lines if they have values with embedded newlines; use of a CSV parser is a better choice in that case.
如果性能不是最重要的,这是一种基于内存的基于管道的方法:
If performance isn't paramount, here's a memory-friendly pipeline-based way to do it:
Get-Content file.txt |
Where-Object ReadCount -ne 1005 |
Set-Content -Encoding Utf8 new-file.txt
Get-Content
向它输出的每一行添加一个(有点晦涩的名字).ReadCount
属性,该属性包含基于1
的行号.
Get-Content
adds a (somewhat obscurely named) .ReadCount
property to each line it outputs, which contains the 1
-based line number.
-
请注意,
Get-Content
不会保留输入文件的字符编码,因此,应使用UTF-8作为示例,显式地控制Set-Content
的st输出编码,如上所示.
Note that the input file's character encoding isn't preserved by
Get-Content
, so you should controlSet-Content
'st output encoding explicitly, as shown above, using UTF-8 as an example.
在不将整个文件作为整体读取到内存的情况下,必须至少临时地输出到 new 文件;您可以使用
用临时输出文件替换原始文件
Move-Item -Force new-file.txt file.txt
Without reading the whole file into memory as a whole, you must output to a new file, at least temporarily; you can replace the original file with the temporary output file with
Move-Item -Force new-file.txt file.txt
基于直接使用.NET框架的更快但占用大量内存的替代方法,它还允许您就地更新文件:
A faster, but memory-intensive alternative based on direct use of the .NET framework, which also allows you to update the file in place:
$file = 'file.txt'
$lines = [IO.File]::ReadAllLines("$PWD/$file")
Set-Content -Encoding UTF8 $file -Value $lines[0..1003 + 1005..($lines.Count-1)]
-
请注意需要使用
"$PWD/$file"
,即,将当前目录路径显式添加到$file
中存储的相对路径之前,因为.NET框架对当前目录的概念与PowerShell有所不同.Note the need to use
"$PWD/$file"
, i.e., to explicitly prepend the current directory path to the relative path stored in$file
, because the .NET framework's idea of what the current directory is differs from PowerShell's.- 虽然
$lines = Get-Content $file
在功能上等效于$lines = [IO.File]::ReadAllLines("$PWD/$file")
,但其性能却明显较差.
- While
$lines = Get-Content $file
would be functionally equivalent to$lines = [IO.File]::ReadAllLines("$PWD/$file")
, it would perform noticeably poorer.
0..1003
创建一个从0
到1003
的索引数组;+
通过输入数组的其余部分将索引为1005
的数组连接起来;请注意,数组索引是基于0
的,而行号则是基于1
的.0..1003
creates an array of indices from0
to1003
;+
concatenates that array with indices1005
through the rest of the input array; note that array indices are0
-based, whereas line numbers are1
-based.还请注意如何通过
-Value
将结果数组作为直接参数传递给Set-Content
,这比通过管道(... | Set-Content ...
)传递数组要快,其中将逐个元素进行处理.Also note how the resulting array is passed to
Set-Content
as a direct argument via-Value
, which is faster than passing it via the pipeline (... | Set-Content ...
), where element-by-element processing would be performed.最后,一种比基于管道的方法快的内存友好方法:
$file = 'file.txt' $outFile = [IO.File]::CreateText("$PWD/new-file.txt") $lineNo = 0 try { foreach ($line in [IO.File]::ReadLines("$PWD/$file")) { if (++$lineNo -eq 1005) { continue } $outFile.WriteLine($line) } } finally { $outFile.Dispose() }
与基于管道的命令一样,之后您可能必须用新文件替换原始文件.
As with the pipeline-based command, you may have to replace the original file with the new file afterwards.
这篇关于Windows Powershell-按行号删除一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- 虽然