匹配以awk args传递的多行上的多个正则表达式 [英] Matching multiple regexs on multiple lines passed as awk args

查看：117 发布时间：2021/5/9 20:50:42 bash awk

本文介绍了匹配以awk args传递的多行上的多个正则表达式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图遍历一个大目录并对每个文件运行不同的正则表达式以提取以下数据；

I'm trying to iterate through a large directory and run different regexs against each file to pull out the following data;

文件名
模式匹配
匹配的行
出现次数

由于@anubhava，我能够获得一个脚本，该脚本将跨多行搜索一个正则表达式并返回我需要的数据.

Thanks to @anubhava I was able to get a script that would search for one regex across multiple lines and return the data I needed.

此后，我尝试修改(并切入)脚本以匹配文件中的多个正则表达式，并返回所有正则表达式的数据.我可能会在一个文件中寻找多达8个正则表达式模式.我现在正试图使其与脚本中硬编码的正则表达式一起使用，但最终我想将正则表达式模式作为args传递给脚本，并对每个模式运行match命令.

I've since tried to adapt (and butchered) the script to match more than one regex in the file and return the data for all the regex's. I could potentially be looking for up to 8 regex patterns in one file. I was trying to get it to work with the regex hardcoded in the script for now but eventually I would like to pass the regex patterns in as args to the script and run the match command against each pattern.

这是目前的awk脚本，但它引发以下错误；

This is the awk script at the present but it is throwing the following error;

fatal: match: third argument is not an array+

脚本;

#!/usr/bin/awk -f

BEGIN {print ARGV[1], "(Filename)"}
{
    RS = "\r?\n" 
    filemsg= "new File() Found on line "
    fismmsg= "FileInputStream Found on line "
   while(match($0, /new[[:blank:]]+File\(/, /FileInputStream/)) {
      nf = match($0, /new[[:blank:]]+File\(/)
      fis = match($0, /FileInputStream/)
      if (nf != ""){
        print filemsg NR
        ++n
      }
      else if (fis != "") {
        print fismmsg NR
        ++m
      }
      $0 = substr($0, RSTART+RLENGTH)
   }
}
/new[[:blank:]]*$/ {
   p = NR
   next
}
/FileInputStream/ {
  l = NR
  next
}
p && NF {
   if (/^[[:blank:]]*File\(/) {
      print filemsg p, "&", NR
      ++n
   }
   p = 0
}
l && NF {
   if (/FileInputStream/) {
      print fismmsg p, "&", NR
      ++m
    }
}
END {
   if (n > 0) {
     print n, "(number of occurrences of new File() pattern)\n"
   }
   else if (m > 0) {
     print m, "(number of occurrences of FileInputStream pattern)\n"
   }
   else {
     print "No occurrences of new File() or FileInputStream\n"
   }
}

毫无疑问，我正在做一些非常愚蠢的事情.

I've no doubt I'm doing something really dumb.

理想情况下，我将每个正则表达式作为var传入，并在ARGV上迭代以在当前硬编码值所在的行中使用，但这也引发了一个问题，即如何将arg拆分为能够在多行上使用我们添加^ [[:: blank:]]之类的字符，以检查模式其余部分之前的行上是否有空格.

Ideally I would pass each regex in as a var and iterate over the ARGV's to use in line where the hardcoded values currently are but that also raises the question on how would you split that arg to be able to use over multi line as we add the likes of ^[[:blank:]] to check for blank spaces on a line before the rest of the pattern.

更新

示例输入为；

awk -v regex1="new[[:blank:]]+File\(" -v regex2="FileInputStream" -v regex3="org\\.apache\\.commons\\.net\\.ftp\\."-f parameterisedRegexAWKScript.awk "$file" >> "output.txt"'

示例输出为；

./modules/configuration/config/rules/somerule.gr (Filename)
No occurrences of new File() 

./modules/configuration/upgrade/contact/somecontact.gs (Filename)

No occurrences of new File() 

./modules/configuration/entity/someentity.gsx (Filename)
No occurrences of new File() 

./modules/configuration/FTP/newFileTest.txt (Filename)
new File() Found on line 15
new File() Found on line 18
new File() Found on line 28
new File() Found on line 37
new File() Found on line 53
5 (number of occurrences of new File() pattern)

./modules/configuration/FTP/test.txt (Filename)
new File() Found on line 3
new File() Found on line 4 & 8
new File() Found on line 10
new File() Found on line 10
4 (number of occurrences of new File() pattern)

./modules/configuration/personaldata/someperson.gs (Filename)
No occurrences of new File() 

./modules/configuration/processes/someprocess.gs (Filename)
No occurrences of new File() 

./originalAwkScript.txt (Filename)
new File() Found on line 6
new File() Found on line 29
new File() Found on line 32
3 (number of occurrences of new File() pattern)

更新2

test.tx的内容

Contents of test.tx

new
File()
new File()
new



File()
File() new
new File() test new File(Test)
FileInputStream

同一文件夹中另一个示例文件的内容；

Contents of another sample file in the same folder;

    protected function buildDocumentsPath(documentRootDir : String, documentTmpDir : String) {
    if (DocumentsPathParameter.HasContent) {
      DemoDocumentsPath = getAbsolutePath(DocumentsPathParameter, documentRootDir)
      if (!new test 
      File(DemoDocumentsPath).equals(new File(DocumentsPathParameter))) {
          Logger.DOCUMENT.warn((typeof this).RelativeName)
          DocumentsPath = getAbsolutePath(DocumentsPathParameter, documentTmpDir)
          var file = new File(DocumentsPath)
          if (!file.exists() && file.isDirectory()) {
              file.mkdirs()
          }
      } 
    }

  }

但是输入文件可以是任何Java类，对此没有什么特殊要求.

But the input files could be any java class, nothing special about them.

要求摘要；本质上，我正在尝试使用bash命令解析大型目录，该命令使用awk脚本搜索不同的正则表达式.这些正则表达式可以出现在类的多行中，我需要捕获问题顶部列出的所有数据.我有不同的搜索类别，因此例如在FTP中，我正在寻找"new File("，"FileInputStream"，"org.apache.commons.net.ftp"，java.nio.file"的出现，因此每个都有一个正则表达式，但是还有其他类别，例如print(具有不同的regex)等.因此，理想情况下，我希望能够将我要搜索的任何正则表达式作为参数传递到awk脚本中，并将检索到的数据存储在文件中.

Summary of requirement; Essentially I'm trying to parse through a large directory using a bash command that uses an awk script to search for different regexs. Those regex's can occur over multiple lines in the classes and I need to capture all the data listed at the top of the question. I have different category of searches, so for example in FTP I'm looking for occurrences 'new File(', 'FileInputStream', 'org.apache.commons.net.ftp', java.nio.file', so there is a regex for each but there are other categories such as print (which has a different regex) etc. So Ideally I want to be able to pass whichever regex I'm searching for into the awk script as params and store the retrieved data in a file.

匹配以awk args传递的多行上的多个正则表达式 [英] Matching multiple regexs on multiple lines passed as awk args

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

匹配以awk args传递的多行上的多个正则表达式 [英] Matching multiple regexs on multiple lines passed as awk args

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭