在一个巨大的文本文件中获取所有包含字符串的行 - 尽可能快? [英] Get all lines containing a string in a huge text file - as fast as possible?
问题描述
在 Powershell 中,如何尽可能快地读取并获取包含巨大文本文件(约 200000 行/30 MBytes)中特定字符串的最后一行(或所有行)?我正在使用:
In Powershell, how to read and get as fast as possible the last line (or all the lines) which contains a specific string in a huge text file (about 200000 lines / 30 MBytes) ? I'm using :
get-content myfile.txt | select-string -pattern "my_string" -encoding ASCII | select -last 1
但是它非常非常长(大约 16-18 秒).我在没有最后一个管道select -last 1
"的情况下进行了测试,但时间相同.
But it's very very long (about 16-18 seconds).
I did tests without the last pipe "select -last 1
", but it's the same time.
是否有更快的方法来获取大文件中特定字符串的最后一次出现(或所有出现)?
Is there a faster way to get the last occurence (or all occurences) of a specific string in huge file?
也许这是需要的时间......或者是否有可能像我希望最后一次出现一样从最后更快地读取文件?谢谢
Perhaps it's the needed time ... Or it there any possiblity to read the file faster from the end as I want the last occurence? Thanks
推荐答案
试试这个:
get-content myfile.txt -ReadCount 1000 |
foreach { $_ -match "my_string" }
这将一次以 1000 条记录为块读取您的文件,并在每个块中找到匹配项.这为您提供了更好的性能,因为您不会在内存管理上浪费大量 CPU 时间,因为管道中一次只有 1000 行.
That will read your file in chunks of 1000 records at a time, and find the matches in each chunk. This gives you better performance because you aren't wasting a lot of cpu time on memory management, since there's only 1000 lines at a time in the pipeline.
这篇关于在一个巨大的文本文件中获取所有包含字符串的行 - 尽可能快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!