从 shell 中查找不包含特定注释的 XML 文件 [英] Find XML files non containing a specific comment from shell

查看:43
本文介绍了从 shell 中查找不包含特定注释的 XML 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想搜索(awk/grep/sed)到几个 XML 文件(pom.xml 文件)中,跳过某个文件夹.此外,第一个条件是它们必须包含标签.对于这些情况,我想打印出那些不包含以下确切序列的内容(它是自动生成的代码 - 它将帮助我检测是否有人修改了该序列):

 <!--|用户代码结束|-->

我被困在这里:

 fileArray=($(find . -type f -not -path "./folder1/*" -not -path "*/folder2/*" -not -path "./folder3/*" -名称pom.xml"|xargs awk -v RS='^$' 'match($0,/[^
]+/,a){print a[0]}'))

请给一些提示?

---更新:

 #!/bin/sh########################################################## 检查用户代码"<模块>在 pom 文件中定义.#########################################################功能检查(){# http://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-bash.htmlOLDIFS=$IFSIFS=$'
'# 将所有 pom 文件读入一个数组# - 搜索用户代码模块:它搜索标签<module>进入 pom 文件,如果它们包含模块,#检查自动生成的部分是否已被修改.从 foo.txt 文件中读取文本序列## - 排除模型文件夹,因为其中的 codegen poms 需要这样的存储库fileArray=($(find . -type f -not -path "./folder1/*" -not -path "*/folder2/*" -not -path "./folder3/*" -name "pom.xml"|xargs `awk -v RS='^$' 'NR==FNR{str=$0;next}//&&!index($0,str){print FILENAME}' 序列 {} +`))IFS=$OLDIFS# 获取数组长度numberOfFiles=${#fileArray[@]}# 读取所有文件名for (( i=0; i<${numberOfFiles}; i++ ));做echo "错误: 发现用户代码模块 (file:line:occurrence): ${fileArray[$i]}"完毕if [ "$numberOfFiles" != "0" ];然后echo "SUMMARY:Found $numberOfFiles pom.xml 文件包含用户代码模块."出口 1菲}查看

----UPDATE(最后一个控制台输出)

 :~/temp>bash脚本.shawk:cmd.行:1:致命:无法打开文件{}"进行读取(没有这样的文件或目录)错误:找到用户代码模块(文件:行:出现):./test_folder/test4/pom.xml ./tes t_folder/test1/pom.xml ./test_folder/test2/pom.xml ./test_folder/test3/pom.xml概要:找到 1 个包含用户代码模块的 pom.xml 文件.

解决方案

将该文本存储在名为 foo 的文件中,然后运行:

find ... -exec awk -v RS='^$' 'NR==FNR{str=$0;next}//&&!index($0,str){print FILENAME}' foo {} +

使用任何适合您的查找选项来获取 XML 文件列表.您是否使用 -exec 或管道到 xargs 取决于您,我真的只是解决 awk 部分,因为这似乎是您遇到的问题.

以上使用 GNU awk 来处理多字符 RS 并严格搜索 foo 的全部内容,这些内容与每个 XML 文件中的字符串完全一样,并打印出任何包含 但不包含该字符串的文件.

如果这不能满足您的要求,请编辑您的问题以显示更完整的输入/输出示例示例,包括您要在输入文件中在上下文中搜索的文本.>

I want to search (awk/grep/sed) into few XML files (pom.xml file) skipping some folder. Moreover,the first condition is that they must contain the tag <module>. For those cases, I want to print out those who does not contain the exactly sequence below (it's autogenerated code - It will help me to detect if somebody modified that sequence):

  <!--
         | Start of user code (user defined modules)
         |-->
        <!-- 
         | End of user code
         |-->

Im stucked here:

        fileArray=($(find . -type f -not -path "./folder1/*" -not -path "*/folder2/*" -not -path "./folder3/*" -name "pom.xml" 
                    | xargs awk -v RS='^$' 'match($0,/<module>[^
]+/,a){print a[0]}'))

Some tips please?

---UPDATE:

  #!/bin/sh

###########################################################
# Checks for "user code" <modules> defined in pom files.
###########################################################

function check()
{
              # http://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-bash.html

        OLDIFS=$IFS
        IFS=$'
'

        # Read all pom files into an array
        # - Search for user code modules: It searches for the tag <module> into the pom files and in case they contain modules,
        #checks if the autogenerated section has been modified. Reading text secuence from foo.txt file
        #
        # - Exclude model folder as the codegen poms therein require such a repository



        fileArray=($(find . -type f -not -path "./folder1/*" -not -path "*/folder2/*" -not -path "./folder3/*" -name "pom.xml" 
                         | xargs `awk -v RS='^$' 'NR==FNR{str=$0;next} /<module>/ && !index($0,str){print FILENAME}' sequence {} +`))


        IFS=$OLDIFS

        # get length of an array
        numberOfFiles=${#fileArray[@]}

        # read all filenames
        for (( i=0; i<${numberOfFiles}; i++ ));
        do
          echo "ERROR:Found user code modules (file:line:occurrence): ${fileArray[$i]}"
        done


    if [ "$numberOfFiles" != "0" ]; then
        echo "SUMMARY:Found $numberOfFiles pom.xml file(s) containing user code modules."
        exit 1
    fi
}

check

----UPDATE (last console output)

    :~/temp> bash script.sh
awk: cmd. line:1: fatal: cannot open file `{}' for reading (No such file or directory)
ERROR:Found user code modules (file:line:occurrence): ./test_folder/test4/pom.xml ./tes                                                                        t_folder/test1/pom.xml ./test_folder/test2/pom.xml ./test_folder/test3/pom.xml
SUMMARY:Found 1 pom.xml file(s) containing user code modules.

解决方案

Store that text in a file named foo and then run:

find ... -exec awk -v RS='^$' 'NR==FNR{str=$0;next} /<module>/ && !index($0,str){print FILENAME}' foo {} +

Use whatever find options work for you to get the list of XML files. Whether you use -exec or pipe to xargs is up to you, I'm really just addressing the awk part as that seems to be what you're having trouble with.

The above uses GNU awk for multi-char RS and does a strict search for the entire contents of foo appearing exactly as written as a string in each of the XML files and prints the name of any file that does contain <module> but does not contain that string.

If that doesn't do what you want then edit your question to show a more complete sample input/output example including the text you want to search for in context in the input file.

这篇关于从 shell 中查找不包含特定注释的 XML 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆