最快的方式将制表符分隔的文件转换为csv在linux [英] fastest way convert tab-delimited file to csv in linux

查看:889
本文介绍了最快的方式将制表符分隔的文件转换为csv在linux的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个制表符分隔的文件,有超过2亿行。什么是最快的方式在linux转换为csv文件?这个文件有多行头信息,我需要在路上剥离,但头的行数是已知的。我已经看到 sed gawk 的建议,但我不知道是否有一个首选选择。

解决方案

为了澄清,这个文件中没有嵌入的标签。如果您需要做的是将所有标签字符转换为逗号字符, tr 可能就是这样。



这里的空格是一个文字标签:

  $ echohello world| tr\\t,
hello,world

如果您在文件中的字符串文字中嵌入了选项卡,这将错误地翻译这些;但嵌入的文字标签将是不常见的。


I have a tab-delimited file that has over 200 million lines. What's the fastest way in linux to convert this to a csv file? This file does have multiple lines of header information which I'll need to strip out down the road, but the number of lines of header is known. I have seen suggestions for sed and gawk, but I wonder if there is a "preferred" choice.

Just to clarify, there are no embedded tabs in this file.

解决方案

If all you need to do is translate all tab characters to comma characters, tr is probably the way to go.

The blank space here is a literal tab:

$ echo "hello   world" | tr "\\t" ","
hello,world

Of course, if you have embedded tabs inside string literals in the file, this will incorrectly translate those as well; but embedded literal tabs would be fairly uncommon.

这篇关于最快的方式将制表符分隔的文件转换为csv在linux的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆