Unix-文件名和字符串结果在同一行 [英] Unix - filename and string result on same line

查看:59
本文介绍了Unix-文件名和字符串结果在同一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要搜索一个包含数百或数千个文件的目录,每个文件都包含带有一个或多个特定字符串实例(带有数据的开始/结束标记)的XML. 我可以通过

I need to search a directory that has hundreds or thousands of files, each containing XML with one or more instances of a specific string (begin/end tag with data). I can get all the instances of the string by doing

grep -ho '<mytagname>..............<\/mytagname>' /home/xyzzy/mydata/*.XML > /home/mydata/tagvalues.txt

然后是一些sed命令以剥离标签,因此我最终得到一个仅包含值列表的文件:

then a few sed commands to strip off the tags, so I wind up with a file just containing a list of values:

  value001
  value002
  value003

(等)

但是理想情况下,我希望文件的每一行还包含文件名,以便我可以导入数据库进行分析.

Ideally though, I'd like to have each line of the file to also include the filename so I can import into a database for analysis.

所以我的结果将是这样

fileAAA value001
fileAAA value002
fileAAA value003
fileBBB value004

上面的精确格式很灵活-可以有空格或其他分隔符,甚至可以包含begin/end标记.

Exact formatting of the above is flexible - could have spaces or other separator, it could even still include the begin/end tags.

我能找到的最接近的是grep -o

The closest I've been able to get is with grep -o

fileAAA:value001
value002
value003
fileBBB:value004

一个perl单一代码似乎很理想,但是我对此还很陌生,以至于我不知道如何开始.

A perl one-liner would seem ideal but I'm new enough to that, that I have no clue how to begin.

推荐答案

可以使用单线完成,例如:

Could be done using a one-liner like so:

perl -lne 'print "$ARGV $1" if /<mytagname>(.*?)<\/mytagname>/' *.xml

但是,我强烈建议您使用实际的XML解析器,例如 XML::Twig XML::LibXML

However, I'd strongly recommend that you use an actual XML parser like XML::Twig or XML::LibXML

use strict;
use warnings;

use XML::LibXML;

for my $file (</home/xyzzy/mydata/*.XML>) {
    my $doc = XML::LibXML->load_xml(location => $file);
    for my $node ($doc->findnodes("//mytagname")) {
        print "$file " . $node->textContent() . "\n";
    }
}

这篇关于Unix-文件名和字符串结果在同一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆