Unix-文件名和字符串结果在同一行 [英] Unix - filename and string result on same line
问题描述
我需要搜索一个包含数百或数千个文件的目录,每个文件都包含带有一个或多个特定字符串实例(带有数据的开始/结束标记)的XML. 我可以通过
I need to search a directory that has hundreds or thousands of files, each containing XML with one or more instances of a specific string (begin/end tag with data). I can get all the instances of the string by doing
grep -ho '<mytagname>..............<\/mytagname>' /home/xyzzy/mydata/*.XML > /home/mydata/tagvalues.txt
然后是一些sed命令以剥离标签,因此我最终得到一个仅包含值列表的文件:
then a few sed commands to strip off the tags, so I wind up with a file just containing a list of values:
value001
value002
value003
(等)
但是理想情况下,我希望文件的每一行还包含文件名,以便我可以导入数据库进行分析.
Ideally though, I'd like to have each line of the file to also include the filename so I can import into a database for analysis.
所以我的结果将是这样
fileAAA value001
fileAAA value002
fileAAA value003
fileBBB value004
上面的精确格式很灵活-可以有空格或其他分隔符,甚至可以包含begin/end标记.
Exact formatting of the above is flexible - could have spaces or other separator, it could even still include the begin/end tags.
我能找到的最接近的是grep -o
The closest I've been able to get is with grep -o
fileAAA:value001
value002
value003
fileBBB:value004
一个perl单一代码似乎很理想,但是我对此还很陌生,以至于我不知道如何开始.
A perl one-liner would seem ideal but I'm new enough to that, that I have no clue how to begin.
推荐答案
可以使用单线完成,例如:
Could be done using a one-liner like so:
perl -lne 'print "$ARGV $1" if /<mytagname>(.*?)<\/mytagname>/' *.xml
但是,我强烈建议您使用实际的XML解析器,例如 XML::Twig
或 XML::LibXML
However, I'd strongly recommend that you use an actual XML parser like XML::Twig
or XML::LibXML
use strict;
use warnings;
use XML::LibXML;
for my $file (</home/xyzzy/mydata/*.XML>) {
my $doc = XML::LibXML->load_xml(location => $file);
for my $node ($doc->findnodes("//mytagname")) {
print "$file " . $node->textContent() . "\n";
}
}
这篇关于Unix-文件名和字符串结果在同一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!