拆分CSV文件，并在输出使用bash排除列，awk或者sed [英] Splitting CSV file and excluding column in output using bash, sed or awk

查看：184 发布时间：2016/7/28 16:50:16 linux bash csv sed awk

本文介绍了拆分CSV文件，并在输出使用bash排除列，awk或者sed的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含如数据的CSV文件如下： -

I have a CSV file which contains data like the following:-

1,275,,,275,17.3,0,"2011-05-09 20:21:45"
2,279,,,279,17.3,0,"2011-05-10 20:21:52"
3,276,,,276,17.3,0,"2011-05-11 20:21:58"
4,272,,,272,17.3,0,"2011-05-12 20:22:04"
5,272,,,272,17.3,0,"2011-05-13 20:22:10"
6,278,,,278,17.3,0,"2011-05-13 20:24:08"
7,270,,,270,17.3,0,"2011-05-13 20:24:14"
8,269,,,269,17.3,0,"2011-05-14 20:24:20"
9,278,,,278,17.3,0,"2011-05-14 20:24:26"

此文件包含4432986行数据。

This file contains 4432986 rows of data.

我要拆分出该文件在最后一栏的日期立足新的文件名。

I wish to split the file out basing the new file name on the date in the last column.

因此基于上述，我想与行6个新的文件，每天在每个文件中的数据。

Therefore based on the data above i would want 6 new files with the rows for each day in each file.

我想在YYYY_MM_DD格式命名的文件。

I would like the files named in YYYY_MM_DD format.

我也想忽略输出数据的第一列

I would also like to ignore the first column in the output data

所以文件2011_05_13将包含以下行，第一列排除： -

So file 2011_05_13 would contain the following rows, with the first column excluded:-

272,,,272,17.3,0,"2011-05-13 20:22:10"
278,,,278,17.3,0,"2011-05-13 20:24:08"
270,,,270,17.3,0,"2011-05-13 20:24:14"

我就准备在Linux中这样做，所以任何使用Linux实用程序什么会很酷，SED AWK等??

I am planning on doing this on a linux box, so anything using any linux utilities would be cool, sed awk etc ??

推荐答案

下面是在一个单行为您 AWK ：

Here's a one-liner for you in awk:

的awk -F，'{分（$ 8阵，）;子（\\，，数组[1]）;分（NR，$ 0）;分（，，$ 0）;打印$ 0 GT;阵[1]}'文件。 TXT

所需的输出实现的，虽然有些也许这code的可以作出更加简洁。 HTH。

Desired output achieved, although perhaps some of this code could be made more succinct. HTH.

编辑：

阅读code，从左至右依次为：

Read code from left to right:

-F，结果
是的，这设置分隔符。

-F ","
Yes this sets the delimiter.

分（$ 8阵，）结果
这分割空间上的第八列，并把这些信息在名为阵列的数组。

子（\\，，数组[1]）结果
我们采取的第一个数组元素（这是那将成为我们的输出文件名切片），并替代了领先的符号（我们需要逃避符号所以我们把前面的 \\ 字符）。

sub ("\"","",array[1])
We take the first array element (this is a slice that's going to become our output file name) and substitute out the leading " symbol (We need to escape the " symbol so we put the \ character in front).

子（NR，，$ 0）结果
这样可方便地将删除你的文件的开头的行号（ NR 是行号， $ 1,0 当然是全划界前输入的线）。

sub (NR,"",$0)
This conveniently removes the line number from the beginning of your file (NR is row number and $0 is of course the whole line of input before delimitation).

子（，，$ 0）结果
这消除了行号后面的逗号。

sub (",","",$0)
This removes the comma after the row number.

现在，我们有一个干净的文件名和数据清理排，我们可以写 $ 1,0 到数组[1] ：打印$ 0 GT;数组[1] 。

Now that we have a clean filename and a clean row of data we can write $0 to array[1]: print $0 > array[1].

FIX：

所以，如果您想preFER一个下划线，而不是一个hypon，我们需要解决的数组[1] 。我刚刚加入全局替换： GSUB（ - ，_，数组[1]）

So if you'd prefer a underscore instead of a hypon, all we need to fix is array[1]. I've just added in a global substitution: gsub ("-","_",array[1]).

更新code是：

的awk -F，'{分（$ 8阵，）;子（\\，，阵列[1]）; GSUB（ - ，_，阵列[1]）;分（NR，$ 0）;子（，，，$ 0）;打印$ 0 GT;阵[1]}'file.txt的

心连心。

这篇关于拆分CSV文件，并在输出使用bash排除列，awk或者sed的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

拆分CSV文件，并在输出使用bash排除列，awk或者sed [英] Splitting CSV file and excluding column in output using bash, sed or awk

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

拆分C​​SV文件，并在输出使用bash排除列，awk或者sed [英] Splitting CSV file and excluding column in output using bash, sed or awk

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

拆分CSV文件，并在输出使用bash排除列，awk或者sed [英] Splitting CSV file and excluding column in output using bash, sed or awk

登录关闭