基于过滤器将awk输出到文件 [英] awk output to file based on filter
问题描述
我有一个很大的 CSV
文件,我需要根据其中一列中的值将其切成不同的片段.我的输入文件 dataset.csv
是这样的:
I have a big CSV
file that I need to cut into different pieces based on the value in one of the columns. My input file dataset.csv
is something like this:
注意:已进行编辑,以澄清数据为,data
没有空格.
NOTE: edited to clarify that data is ,data,
no spaces.
action,action_type, Result
up,1,stringA
down,1,strinB
left,2,stringC
所以,我要按 action_type
进行拆分(我需要在结果文件中完整匹配一行):
So, to split by action_type
I simply do (I need the whole matching line in the resulting file):
awk -F, '$2 ~ /^1$/ {print}' dataset.csv >> 1_dataset.csv
awk -F, '$2 ~ /^2$/ {print}' dataset.csv >> 2_dataset.csv
这可以按预期工作,但是我基本上要遍历两次原始数据集.我的原始数据集约为5GB,我有30个 action_type
类别.我每天都需要这样做,所以,我需要编写脚本以使其高效运行.
This works as expected but I am basicaly travesing my original dataset twice. My original dataset is about 5GB and I have 30 action_type
categories. I need to do this everyday, so, I need to script the thing to run on its own efficiently.
我尝试了以下操作,但不起作用:
I tried the following but it does not work:
# This is a file called myFilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
然后我将其运行为:
awk -f myFilter.awk dataset.csv
但是我什么也没得到.从字面上看,什么都没有,甚至没有错误.哪种方式告诉我我的代码根本不匹配任何内容,或者我的print/pipe语句是错误的.
But I get nothing. Literally nothing, no even errors. Which sort of tell me that my code is simply not matching anything or my print / pipe statement is wrong.
推荐答案
您可以尝试通过单个命令来执行以下操作:
You may try this awk to do this in a single command:
awk -F, 'NR > 1{fn = $2 "_dataset.csv"; print >> fn; close(fn)}' file
这篇关于基于过滤器将awk输出到文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!