拆分CSV文件,并在输出使用bash排除列,awk或者sed [英] Splitting CSV file and excluding column in output using bash, sed or awk
问题描述
我有一个包含如数据的CSV文件如下: -
I have a CSV file which contains data like the following:-
1,275,,,275,17.3,0,"2011-05-09 20:21:45"
2,279,,,279,17.3,0,"2011-05-10 20:21:52"
3,276,,,276,17.3,0,"2011-05-11 20:21:58"
4,272,,,272,17.3,0,"2011-05-12 20:22:04"
5,272,,,272,17.3,0,"2011-05-13 20:22:10"
6,278,,,278,17.3,0,"2011-05-13 20:24:08"
7,270,,,270,17.3,0,"2011-05-13 20:24:14"
8,269,,,269,17.3,0,"2011-05-14 20:24:20"
9,278,,,278,17.3,0,"2011-05-14 20:24:26"
此文件包含4432986行数据。
This file contains 4432986 rows of data.
我要拆分出该文件在最后一栏的日期立足新的文件名。
I wish to split the file out basing the new file name on the date in the last column.
因此基于上述,我想与行6个新的文件,每天在每个文件中的数据。
Therefore based on the data above i would want 6 new files with the rows for each day in each file.
我想在YYYY_MM_DD格式命名的文件。
I would like the files named in YYYY_MM_DD format.
我也想忽略输出数据的第一列
I would also like to ignore the first column in the output data
所以文件2011_05_13将包含以下行,第一列排除: -
So file 2011_05_13 would contain the following rows, with the first column excluded:-
272,,,272,17.3,0,"2011-05-13 20:22:10"
278,,,278,17.3,0,"2011-05-13 20:24:08"
270,,,270,17.3,0,"2011-05-13 20:24:14"
我就准备在Linux中这样做,所以任何使用Linux实用程序什么会很酷,SED AWK等??
I am planning on doing this on a linux box, so anything using any linux utilities would be cool, sed awk etc ??
推荐答案
下面是在一个单行为您 AWK
:
Here's a one-liner for you in awk
:
的awk -F,'{分($ 8阵,);子(\\,,数组[1]);分(NR,$ 0);分(,,$ 0);打印$ 0 GT;阵[1]}'文件。 TXT
所需的输出实现的,虽然有些也许这code的可以作出更加简洁。 HTH。
Desired output achieved, although perhaps some of this code could be made more succinct. HTH.
编辑:
阅读code,从左至右依次为:
Read code from left to right:
-
-F,
结果
是的,这设置分隔符。
-F ","
Yes this sets the delimiter.
分($ 8阵,)
结果
这分割空间上的第八列,并把这些信息在名为阵列
的数组。
子(\\,,数组[1])
结果
我们采取的第一个数组元素(这是那将成为我们的输出文件名切片),并替代了领先的符号(我们需要逃避
符号所以我们把前面的
\\
字符)。
sub ("\"","",array[1])
We take the first array element (this is a slice that's going to become our output file name) and substitute out the leading "
symbol (We need to escape the "
symbol so we put the \
character in front).
子(NR,,$ 0)
结果
这样可方便地将删除你的文件的开头的行号( NR
是行号, $ 1,0
当然是全划界前输入的线)。
sub (NR,"",$0)
This conveniently removes the line number from the beginning of your file (NR
is row number and $0
is of course the whole line of input before delimitation).
子(,,$ 0)
结果
这消除了行号后面的逗号。
sub (",","",$0)
This removes the comma after the row number.
现在,我们有一个干净的文件名和数据清理排,我们可以写 $ 1,0
到数组[1]
:打印$ 0 GT;数组[1]
。
Now that we have a clean filename and a clean row of data we can write $0
to array[1]
: print $0 > array[1]
.
FIX:
所以,如果您想preFER一个下划线,而不是一个hypon,我们需要解决的数组[1]
。我刚刚加入全局替换: GSUB( - ,_,数组[1])
So if you'd prefer a underscore instead of a hypon, all we need to fix is array[1]
. I've just added in a global substitution: gsub ("-","_",array[1])
.
更新code是:
的awk -F,'{分($ 8阵,);子(\\,,阵列[1]); GSUB( - ,_,阵列[1]);分(NR,$ 0);子(,, ,$ 0);打印$ 0 GT;阵[1]}'file.txt的
心连心。
这篇关于拆分CSV文件,并在输出使用bash排除列,awk或者sed的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!