在一个巨大的文本文件中获取所有包含字符串的行 - 尽可能快? [英] Get all lines containing a string in a huge text file - as fast as possible?

查看:44
本文介绍了在一个巨大的文本文件中获取所有包含字符串的行 - 尽可能快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Powershell 中,如何尽可能快地读取并获取包含巨大文本文件(约 200000 行/30 MBytes)中特定字符串的最后一行(或所有行)?我正在使用:

In Powershell, how to read and get as fast as possible the last line (or all the lines) which contains a specific string in a huge text file (about 200000 lines / 30 MBytes) ? I'm using :

get-content myfile.txt | select-string -pattern "my_string" -encoding ASCII | select -last 1

但是它非常非常长(大约 16-18 秒).我在没有最后一个管道select -last 1"的情况下进行了测试,但时间相同.

But it's very very long (about 16-18 seconds). I did tests without the last pipe "select -last 1", but it's the same time.

是否有更快的方法来获取大文件中特定字符串的最后一次出现(或所有出现)?

Is there a faster way to get the last occurence (or all occurences) of a specific string in huge file?

也许这是需要的时间......或者是否有可能像我希望最后一次出现一样从最后更快地读取文件?谢谢

Perhaps it's the needed time ... Or it there any possiblity to read the file faster from the end as I want the last occurence? Thanks

推荐答案

试试这个:

get-content myfile.txt -ReadCount 1000 |
 foreach { $_ -match "my_string" }

这将一次以 1000 条记录为块读取您的文件,并在每个块中找到匹配项.这为您提供了更好的性能,因为您不会在内存管理上浪费大量 CPU 时间,因为管道中一次只有 1000 行.

That will read your file in chunks of 1000 records at a time, and find the matches in each chunk. This gives you better performance because you aren't wasting a lot of cpu time on memory management, since there's only 1000 lines at a time in the pipeline.

这篇关于在一个巨大的文本文件中获取所有包含字符串的行 - 尽可能快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆