从shell中查找不包含特定注释的XML文件 [英] Find XML files non containing a specific comment from shell

查看:251
本文介绍了从shell中查找不包含特定注释的XML文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将(awk / grep / sed)搜索到几个跳过某个文件夹的XML文件(pom.xml文件)。此外,第一个条件是它们必须包含标签< module> 。对于这些情况,我想打印出那些不包含以下完全序列的人(它是自动生成的代码 - 它将帮助我检测是否有人修改了该序列):

 <! -  
|用户代码开始(用户定义的模块)
| - >
<! -
|用户代码结束
| - >

Im在这里填写:

  fileArray =($(find。-type f -not -path./folder1/*-not -path* / folder2 / *-not -path./folder3/* -namepom.xml\ 
| xargs awk -v RS ='^ $''match($ 0,/ \< module> [^ \\\
] + /,a){print a [0]}'))



--- UPDATE:

 #!/ bin / sh 

## ################################################## #######
#检查用户代码<模块>在pom文件中定义。
############################################## #############

函数check()
{
#http://www.cyberciti.biz/tips/handling-filenames -with-spaces-in-bash.html

OLDIFS = $ IFS
IFS = $'\\\
'

#将所有pom文件读入数组
# - 搜索用户代码模块:它搜索标签< module>到pom文件中,如果它们包含模块,
#checks(如果自动生成的部分已被修改)。从foo.txt文件读取文本secuence

# - 排除模型文件夹,因为其中的codegen poms需要这样的存储库



fileArray =( $(find。-type f -not -path./folder1/*-not -path* / folder2 / *-not -path./folder3/*-namepom.xml\
| xargs`awk -v RS ='^ $''NR == FNR {str = $ 0; next} /< module> /&&!index($ 0,str){print FILENAME}'sequence {} +`))


IFS = $ OLDIFS

#获取数组长度
numberOfFiles = $ {#fileArray [@]} ((i = 0; i <$ {numberOfFiles}; i ++));

#读取所有文件名

do
echo错误:找到用户代码模块(文件:行:出现):$ {fileArray [$ i]}
完成


如果[$ numberOfFiles!=0];然后
回显摘要:找到$ numberOfFiles包含用户代码模块的pom.xml文件。
出口1
fi
}

检查

---- UPDATE(上次控制台输出)

 :〜/ temp> bash script.sh 
awk:cmd。行:1:致命:无法打开文件{}用于读取(无此文件或目录)
错误:找到用户代码模块(文件:行:出现):./test_folder/test4/pom.xml。 / tes t_folder / test1 / pom.xml ./test_folder/test2/pom.xml ./test_folder/test3/pom.xml
摘要:找到1个包含用户代码模块的pom.xml文件。


解决方案

将该文本存储在名为foo的文件中,然后运行:

  find ... -exec awk -v RS ='^ $''NR == FNR {str = $ 0; next} /< module> /&& !index($ 0,str){print FILENAME}'foo {} + 

使用任何查找选项为您获取XML文件的列表。无论你使用-exec还是pipe到xargs都取决于你,我真的只是解决awk部分,因为这似乎是你遇到的麻烦。



上面的代码使用了GNU awk for multi-char RS,并严格搜索每一个XML文件中完全按照字符串形式出现的 foo 的全部内容并打印任何包含< module> 但不包含该字符串的文件的名称。



做你想做的事,然后编辑你的问题,在输入文件中显示一个更完整的示例输入/输出示例,其中包括要搜索的上下文中的文本。


I want to search (awk/grep/sed) into few XML files (pom.xml file) skipping some folder. Moreover,the first condition is that they must contain the tag <module>. For those cases, I want to print out those who does not contain the exactly sequence below (it's autogenerated code - It will help me to detect if somebody modified that sequence):

  <!--
         | Start of user code (user defined modules)
         |-->
        <!-- 
         | End of user code
         |-->

Im stucked here:

        fileArray=($(find . -type f -not -path "./folder1/*" -not -path "*/folder2/*" -not -path "./folder3/*" -name "pom.xml" \
                    | xargs awk -v RS='^$' 'match($0,/\<module>[^\n]+/,a){print a[0]}'))

Some tips please?

---UPDATE:

  #!/bin/sh

###########################################################
# Checks for "user code" <modules> defined in pom files.
###########################################################

function check()
{
              # http://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-bash.html

        OLDIFS=$IFS
        IFS=$'\n'

        # Read all pom files into an array
        # - Search for user code modules: It searches for the tag <module> into the pom files and in case they contain modules,
        #checks if the autogenerated section has been modified. Reading text secuence from foo.txt file
        #
        # - Exclude model folder as the codegen poms therein require such a repository



        fileArray=($(find . -type f -not -path "./folder1/*" -not -path "*/folder2/*" -not -path "./folder3/*" -name "pom.xml" \
                         | xargs `awk -v RS='^$' 'NR==FNR{str=$0;next} /<module>/ && !index($0,str){print FILENAME}' sequence {} +`))


        IFS=$OLDIFS

        # get length of an array
        numberOfFiles=${#fileArray[@]}

        # read all filenames
        for (( i=0; i<${numberOfFiles}; i++ ));
        do
          echo "ERROR:Found user code modules (file:line:occurrence): ${fileArray[$i]}"
        done


    if [ "$numberOfFiles" != "0" ]; then
        echo "SUMMARY:Found $numberOfFiles pom.xml file(s) containing user code modules."
        exit 1
    fi
}

check

----UPDATE (last console output)

    :~/temp> bash script.sh
awk: cmd. line:1: fatal: cannot open file `{}' for reading (No such file or directory)
ERROR:Found user code modules (file:line:occurrence): ./test_folder/test4/pom.xml ./tes                                                                        t_folder/test1/pom.xml ./test_folder/test2/pom.xml ./test_folder/test3/pom.xml
SUMMARY:Found 1 pom.xml file(s) containing user code modules.

解决方案

Store that text in a file named foo and then run:

find ... -exec awk -v RS='^$' 'NR==FNR{str=$0;next} /<module>/ && !index($0,str){print FILENAME}' foo {} +

Use whatever find options work for you to get the list of XML files. Whether you use -exec or pipe to xargs is up to you, I'm really just addressing the awk part as that seems to be what you're having trouble with.

The above uses GNU awk for multi-char RS and does a strict search for the entire contents of foo appearing exactly as written as a string in each of the XML files and prints the name of any file that does contain <module> but does not contain that string.

If that doesn't do what you want then edit your question to show a more complete sample input/output example including the text you want to search for in context in the input file.

这篇关于从shell中查找不包含特定注释的XML文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆