根据条件将bash中的CSV文件拆分为多个文件 [英] Split CSV file in bash into multiple files based on condition

查看:103
本文介绍了根据条件将bash中的CSV文件拆分为多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的csv文件包含多行数据,我想根据一个属性将其拆分为多个文件.

My csv file has multiple rows of data and I want to split it into multiple files based on one attribute.

beeline -u jdbc:hive2:<MYHOST> -n <USER> -p <PASSWORD> --silent=true --outputformat=csv2 -f <SQL FILE> > result_+%Y%m%d_%H%M%S.csv

带有ORDER BY ID的SQL代码是从直线上触发的,它会创建单个CSV.

SQL code with ORDER BY ID is triggered from beeline which creates single CSV.


cat sql.csv
"attr;attr;ID;attr"
"data;data;XXXX;date"
"data;data;XXXX;date"
"data;data;YYYYY;date"
"data;data;YYYYY;date"
"data;data;BBBBB;date"
"data;data;BBBBB;date"

所需的结果是,一旦识别出新的ID并在文件名中使用该ID,就将其拆分.

Desired result is to split once new ID is recognised and use that ID in filename.

file_1_ID_XXXX_+%Y%m%d_%H%M%S:


attr   attr    ID  attr
data    data    XXXX    date
data    data    XXXX    date

file_2_ID_YYYYY_+%Y%m%d_%H%M%S:


attr   attr    ID  attr
data    data    YYYYY   date
data    data    YYYYY   date

推荐答案

如果我理解您的问题,则可以使用sql生成的csv文件,然后通过使用几个变量(字符串)将其拆分为3个文件串联,然后重定向到输出文件,例如

If I understand your question, you can take the csv file produced by sql and then split that into the 3 files you show simply by using a few variables, string concatenation and then by redirecting to the output files, e.g.

awk -v field=a -v n=1 -v dt=$(date '+%Y%m%d_%H%M%S') '
    FNR == 1 {hdg=$0; next}
    a != $3 {a = $3; name="file_"n"_ID_"a"_"dt; n++; print hdg > name}
    {print $0 > name}
' sqldata

示例输入文件

sqldata文件所在的位置:

$ cat sqldata
attr    attr    ID  attr
data    data    XXXX    date
data    data    XXXX    date
data    data    YYYYY   date
data    data    YYYYY   date
data    data    BBBBB   date
data    data    BBBBB   date

使用/输出文件示例

使用正确的文件名简单地将awk脚本复制并粘贴到终端中,将产生以下三个输出文件:

Simply copying and middle-mouse pasting awk script into the terminal with the correct filename to read would produce the following three output files:

$ cat file_1_ID_XXXX_20190805_033514
attr    attr    ID  attr
data    data    XXXX    date
data    data    XXXX    date

$ cat file_2_ID_YYYYY_20190805_033514
attr    attr    ID  attr
data    data    YYYYY   date
data    data    YYYYY   date

$ cat file_3_ID_BBBBB_20190805_033514
attr    attr    ID  attr
data    data    BBBBB   date
data    data    BBBBB   date

仔细检查一下,让我知道这是否是您想要的.如果没有,请告诉我,我很乐意提供进一步的帮助.

Look things over and let me know if this is what you intended. If not, let me know and I'm happy to help further.

这篇关于根据条件将bash中的CSV文件拆分为多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆