如何使用awk处理3个文件? [英] How to handle 3 files with awk?

查看:134
本文介绍了如何使用awk处理3个文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好,所以花了两天后,我无法解决它,现在我几乎没有时间了.这可能是一个非常愚蠢的问题,所以请忍受我.我的awk脚本执行以下操作:

Ok, so after spending 2 days, I am not able solve it and I am almost out of time now. It might be a very silly question, so please bear with me. My awk script does something like this:

BEGIN{ n=50; i=n; }
FNR==NR {
            # Read file-1, which has just 1 column
            ids[$1]=int(i++/n);
            next
        }
        {
            # Read file-2 which has 4 columns
            # Do something
            next
        }
 END {...}

工作正常.但是现在我想将其扩展为读取3个文件.假设不是硬编码"n"的值,而是需要读取属性文件并从中设置"n"的值.我发现了这个问题,并尝试了以下操作:

It works fine. But now I want to extend it to read 3 files. Let's say, instead of hard-coding the value of "n", I need to read a properties file and set value of "n" from that. I found this question and have tried something like this:

BEGIN{ n=0; i=0; }
FNR==NR {
            # Block A
            # Try to read file-0
            next
        }
        {
            # Block B
            # Read file-1, which has just 1 column
            next
        }
        {
            # Block C
            # Read file-2 which has 4 columns
            # Do something
            next
        }
 END {...}

但是它不起作用.对文件-0执行块A,我能够从属性文件中读取属性.但是对文件file-1和file-2都执行了块B.而且C块永远不会执行.

But it is not working. Block A is executed for file-0, I am able to read the property from properties files. But Block B is executed for both files file-1 and file-2. And Block C is never executed.

有人可以帮我解决这个问题吗?我以前从未使用过awk,语法非常混乱.另外,如果有人可以解释awk如何从不同文件读取输入,那将非常有帮助.

Can someone please help me solve this? I have never used awk before and the syntax is very confusing. Also, if someone can explain how awk reads input from different files, that will be very helpful.

如果需要在问题中添加更多详细信息,请告诉我.

Please let me know if I need to add more details to the question.

推荐答案

更新:只要所有输入文件都是 nonempty 下面的解决方案就可以使用/strong>,但请参见 @Ed Morton的答案,以更简单,更可靠的方式添加特定于文件的处理.

Update: The solution below works, as long as all input files are nonempty, but see @Ed Morton's answer for a simpler and more robust way of adding file-specific handling.

但是,此答案仍对某些awk基本知识以及OP的方法为何无效的解释提供了希望的帮助.

However, this answer still provides a hopefully helpful explanation of some awk basics and why the OP's approach didn't work.

尝试以下操作(请注意,我已将索引设为基于 1 的索引,因为这是awk的工作方式):

Try the following (note that I've made the indices 1-based, as that's how awk does it):

awk '

 # Increment the current-file index, if a new file is being processed.
 FNR == 1 { ++fIndex }

 # Process current line if from 1st file.
 fIndex == 1 {
    print "file 1: " FILENAME
    next
 }

 # Process current line if from 2nd file.
 fIndex == 2 {
    print "file 2: " FILENAME
    next
 }

 # Process current line (from all remaining files).
 {
    print "file " fIndex ": " FILENAME
 }

' file-1 file-2 file-3

    每当开始处理新的输入文件时,
  • 模式FNR==1为true(FNR包含输入文件相对行号).
  • 每次开始处理新文件时,fIndex都会递增,从而反映当前输入文件的从1开始的索引. 提示 @twalberg的有用答案.

    • Pattern FNR==1 is true whenever a new input file is starting to get processed (FNR contains the input file-relative line number).
    • Every time a new file starts processing, fIndexis incremented and thus reflects the 1-based index of the current input file. Tip of the hat to @twalberg's helpful answer.

      • 请注意,在数字上下文中使用的未初始化的awk变量默认为0,因此无需初始化fIndex(除非您需要其他起始值).
      • Note that an uninitialized awk variable used in a numeric context defaults to 0, so there's no need to initialize fIndex (unless you want a different start value).
      • 诸如fIndex == 1之类的模式然后可以仅用于执行来自特定输入文件的行的块(假设该块以next结尾).
      • 然后对所有没有文件特定块的输入文件执行最后一个块(上面).
      • Patterns such as fIndex == 1 can then be used to execute blocks for lines from a specific input file only (assuming the block ends in next).
      • The last block is then executed for all input files that don't have file-specific blocks (above).

      关于为什么您的方法不起作用:

      • 对于 all 输入文件中的行,潜在地无条件地执行 您的第二个和第三个块,因为它们之前没有模式(条件).

      • Your 2nd and 3rd blocks are potentially executed unconditionally, for lines from all input files, because they are not preceded by a pattern (condition).

      因此,从 all 后续输入文件输入行的第二个块,然后它的next语句防止到达第三个块.

      So your 2nd block is entered for lines from all subsequent input files, and its next statement then prevents the 3rd block from ever getting reached.

      潜在的误解:

      • 也许您认为每个块都充当处理单个输入文件的循环.这不是awk的工作方式.取而代之的是,整个awk程序是循环处理的,每个迭代处理一条输入线,从文件1的所有行开始,然后从文件2的所有行开始. ...

      • Perhaps you think that each block functions as a loop processing a single input file. This is NOT how awk works. Instead, the entire awk program is processed in a loop, with each iteration processing a single input line, starting with all lines from file 1, then from file 2, ...

      awk程序可以具有任意数量的块(通常在模式之前),并且是否针对当前输入行执行它们仅取决于模式是否为true;如果没有模式,则该块将无条件执行(跨输入文件).但是,您已经发现,块内的next可用于跳过后续块(模式块对).

      An awk program can have any number of blocks (typically preceded by patterns), and whether they're executed for the current input line is solely governed by whether the pattern evaluates to true; if there is no pattern, the block is executed unconditionally (across input files). However, as you've already discovered, next inside a block can be used to skip subsequent blocks (pattern-block pairs).

      这篇关于如何使用awk处理3个文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆