grep使用来自其他文件的单词匹配行中的特定位置 [英] grep matching specific position in lines using words from other file

查看:130
本文介绍了grep使用来自其他文件的单词匹配行中的特定位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个文件

file1:

  12342015010198765hello 
12342015010188765hello
12342015010178765hello

每行包含固定位置的字段,例如,位置 13 - 17 用于 account_id



file2:

  98765 
88765

其中包含 account_id s。



在Korn Shell中, strong>我想从file1中打印位于 13 - 17 与file2中的 account_id 之一的行。



我不能这样做

$ p $ grep -f file2 file1

因为file2中的 account_id 可以匹配其他字段

 > ^。{12} 98765. * 

但没有用。

解决方案

使用awk



  $ awk'NR == FNR {a [$ 1] = 1 ; next;} substr($ 0,13,5)在'file2 file1 
12342015010198765hello
12342015010188765hello



工作原理




  • NR == FNR {a [$ 1] = 1; next;}



    FNR是从当前文件中读取的行数,NR是读取的总行数远。因此,如果 FNR == NR ,我们读取的第一个文件是 file2



    file2中的每个ID都保存在数组 a 中。然后,我们跳过其余的命令并跳转到下一个行。

  • $ c> substr($ 0,13,5)在一个



    如果我们达到这个命令,我们正在处理第二个文件 file1



    如果从位置13开始的5个字符长的子字符串位于数组 A 。如果条件为真,则awk将执行打印该行的默认操作。使用grep 使用grep h2>

    您提到尝试

      grep'^。{12} 98765. * 'file2 

    它使用扩展正则表达式语法,这意味着 -E 是必需的。而且,在最后匹配。* 没有任何价值:它总是匹配。因此,试试:

      $ grep -E'^。{12} 98765'file1 
    12342015010198765hello

    获得两条线:

      $ grep -E'^。{12} [89] 8765'file1 
    12342015010198765hello
    12342015010188765hello

    这是可行的,因为 [89] 8765 恰好匹配file2中感兴趣的ID。当然,awk解决方案在匹配什么ID方面提供了更大的灵活性。


    I have 2 file

    file1:

    12342015010198765hello
    12342015010188765hello
    12342015010178765hello
    

    whose each line contains fields at fixed positions, for example, position 13 - 17 is for account_id

    file2:

    98765
    88765
    

    which contains a list of account_ids.

    In Korn Shell, I want to print lines from file1 whose position 13 - 17 match one of account_id in file2.

    I can't do

    grep -f file2 file1
    

    because account_id in file2 can match other fields at other positions.

    I have tried using pattern in file2:

    ^.{12}98765.*
    

    but did not work.

    解决方案

    Using awk

    $ awk 'NR==FNR{a[$1]=1;next;} substr($0,13,5) in a' file2 file1
    12342015010198765hello
    12342015010188765hello
    

    How it works

    • NR==FNR{a[$1]=1;next;}

      FNR is the number of lines read so far from the current file and NR is the total number of lines read so far. Thus, if FNR==NR, we are reading the first file which is file2.

      Each ID in in file2 is saved in array a. Then, we skip the rest of the commands and jump to the next line.

    • substr($0,13,5) in a

      If we reach this command, we are working on the second file, file1.

      This condition is true if the 5 character long substring that starts at position 13 is in array a. If the condition is true, then awk performs the default action which is to print the line.

    Using grep

    You mentioned trying

    grep '^.{12}98765.*' file2
    

    That uses extended regex syntax which means that -E is required. Also, there is no value in matching .* at the end: it will always match. Thus, try:

    $ grep -E '^.{12}98765' file1
    12342015010198765hello
    

    To get both lines:

    $ grep -E '^.{12}[89]8765' file1
    12342015010198765hello
    12342015010188765hello
    

    This works because [89]8765 just happens to match the IDs of interest in file2. The awk solution, of course, provides more flexibility in what IDs to match.

    这篇关于grep使用来自其他文件的单词匹配行中的特定位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆