基于过滤器将awk输出到文件 [英] awk output to file based on filter

查看:98
本文介绍了基于过滤器将awk输出到文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的 CSV 文件,我需要根据其中一列中的值将其切成不同的片段.我的输入文件 dataset.csv 是这样的:

I have a big CSV file that I need to cut into different pieces based on the value in one of the columns. My input file dataset.csv is something like this:

注意:已进行编辑,以澄清数据为,data 没有空格.

NOTE: edited to clarify that data is ,data, no spaces.

action,action_type, Result
up,1,stringA
down,1,strinB
left,2,stringC

所以,我要按 action_type 进行拆分(我需要在结果文件中完整匹配一行):

So, to split by action_type I simply do (I need the whole matching line in the resulting file):

awk -F, '$2 ~ /^1$/ {print}' dataset.csv >> 1_dataset.csv
awk -F, '$2 ~ /^2$/ {print}' dataset.csv >> 2_dataset.csv

这可以按预期工作,但是我基本上要遍历两次原始数据集.我的原始数据集约为5GB,我有30个 action_type 类别.我每天都需要这样做,所以,我需要编写脚本以使其高效运行.

This works as expected but I am basicaly travesing my original dataset twice. My original dataset is about 5GB and I have 30 action_type categories. I need to do this everyday, so, I need to script the thing to run on its own efficiently.

我尝试了以下操作,但不起作用:

I tried the following but it does not work:

# This is a file called myFilter.awk

{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}

然后我将其运行为:

awk -f myFilter.awk dataset.csv

但是我什么也没得到.从字面上看,什么都没有,甚至没有错误.哪种方式告诉我我的代码根本不匹配任何内容,或者我的print/pipe语句是错误的.

But I get nothing. Literally nothing, no even errors. Which sort of tell me that my code is simply not matching anything or my print / pipe statement is wrong.

推荐答案

您可以尝试通过单个命令来执行以下操作:

You may try this awk to do this in a single command:

awk -F, 'NR > 1{fn = $2 "_dataset.csv"; print >> fn; close(fn)}' file

这篇关于基于过滤器将awk输出到文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆