从这么多文件的开头删除数字的最快方法是什么? [英] What is the fastest way to remove a number from the beginning of so many files?

查看:68
本文介绍了从这么多文件的开头删除数字的最快方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有1000个文件,每个文件有100万行.每行的格式如下:

I have 1000 files each having one million lines. Each line has the following form:

a number,a text

我想删除每个文件每一行开头的所有数字.包括

I want to remove all of the numbers from the beginning of every line of every file. including the ,

示例:

14671823,aboasdyflj -> aboasdyflj

我正在做的是:

os.system("sed -i -- 's/^.*,//g' data/*")

它工作正常,但要花费大量时间.

and it works fine but it's taking a huge amount of time.

最快的方法是什么?

我正在用python进行编码.

I'm coding in python.

推荐答案

这要快得多:

cut -f2 -d ',' data.txt > tmp.txt && mv tmp.txt data.txt

在具有1100万行的文件中,花费的时间不到一秒钟.

On a file with 11 million rows it took less than one second.

要在目录中的多个文件上使用此命令,请使用:

To use this on several files in a directory, use:

TMP=/pathto/tmpfile
for file in dir/*; do
    cut -f2 -d ',' "$file" > $TMP && mv $TMP "$file"
done

值得一提的是,在原地完成工作而不是使用单独的文件通常需要更长的时间.我尝试了您的sed命令,但从原地切换到了临时文件.总时间从26秒减少到9秒.

A thing worth mentioning is that it often takes much longer time to do stuff in place rather than using a separate file. I tried your sed command but switched from in place to a temporary file. Total time went down from 26s to 9s.

这篇关于从这么多文件的开头删除数字的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆