生成列表时如何关闭awk中的文件? [英] How to close file in awk while generating a list of?
问题描述
伙计们,我正在尝试寻找一种没有awk错误打开文件太多"的方法.这是我的情况:
Guys I'm trying to find a way to don't have the awk error "too many open file" . Here's my situation:
INPUT:采用这种方案的ASCII文件,很多行:
INPUT : ASCII file, lot of line, with this scheme:
NODE_212_lenght.._1
NODE_212_lenght.._2
NODE_213_lenght.._1
NODE_213_lenght.._2
为了用每个具有相同NODE编号的记录来分割此文件,我使用了这种单线awk命令
In order to split this file with every record with the same NODE number, I've used this one-liner awk command
awk -F "_" '{print >("orfs_for_node_" $2 "")}' <file
对于由许多行组成的文件,此命令在打开的文件太多"中保持发言.我也尝试过用2k线分割,同样.我实际上不能超过2k行,因为输入的文件很大.
With a file composed by lots of lines, this command keeps sayin "too many open files" . I've tried also by splitting by 2k lines, same. I can't actually go under 2k lines, because the input one is a huge file.
我知道awk在执行内部操作后可以关闭文件,但是我实际上不知道该怎么做.我尝试添加
I know awk could close a file after doing something inside, but I don't know actually how to do that. I've tried adding
awk -F "_" '{print >("orfs_for_node_" $2 ""); close(orfs_for_node_*)}' <file
但这不会产生任何输出.
but this will make no output.
推荐答案
如果您切换到GNU awk,它将为您处理.否则,如果您的输入文件将每个$ 2值的所有行分组在一起,则这是正确的语法:
If you switch to GNU awk that'll handle it for you. Otherwise this is the right syntax if your input file has all the lines for each $2 value grouped together:
awk -F '_' '{out="orfs_for_node_"$2} out!=prev{close(prev)} {print > out; prev=out}' file
否则,您需要使用>>
而不是>
:
otherwise you need to use >>
instead of >
:
awk -F '_' '{out="orfs_for_node_"$2} out!=prev{close(prev)} {print >> out; prev=out}' file
请注意,在第二种情况下,您需要先清空所有先前存在的输出"文件(例如,上一次运行中的文件),因为它始终会附加到输出文件中.
Note that in that second case you'd need to empty any pre-existing "out" files (e.g. from a previous run) before running it since it'll always append to the output files.
这篇关于生成列表时如何关闭awk中的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!