我怎样才能挖掘使用awk,Perl中,或Python的XML文档? [英] How can I mine an XML document with awk, Perl, or Python?

查看:108
本文介绍了我怎样才能挖掘使用awk,Perl中,或Python的XML文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据格式的XML文件:

I have a XML file with the following data format:

<net NetName="abc" attr1="123" attr2="234" attr3="345".../>
<net NetName="cde" attr1="456" attr2="567" attr3="678".../>
....

谁能告诉我,我怎么能数据挖掘使用一个awk单行XML文件?例如,我想知道农行attr3。它会返回345给我。

Can anyone tell me how could I data mine the XML file using an awk one-liner? For example, I would like to know attr3 of abc. It will return 345 to me.

推荐答案

在一般情况下,<一个href=\"http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-rege\">you不的。 XML / HTML解析是够硬,但不尝试简洁做到这一点,虽然你可以破解一起使用XML的有限子集成功,最终将打破一个解决方案。

In general, you don't. XML/HTML parsing is hard enough without trying to do it concisely, and while you may be able to hack together a solution that succeeds with a limited subset of XML, eventually it will break.

此外,<一href=\"http://stackoverflow.com/questions/773340/can-you-provide-an-example-of-parsing-html-with-your-favorite-parser\">there与已经写好伟大的XML解析器许多伟大的语言,那么为什么不使用其中之一,让您的生活更轻松?

Besides, there are many great languages with great XML parsers already written, so why not use one of them and make your life easier?

我不知道是否有对AWK内置XML解析器,但我怕,如果你想与解析XML AWK你会得到很多的锤子钉子,螺丝起子是螺钉的答案。我敢肯定,这是可以做到的,但它可能会更容易为你写的东西迅速在Perl中使用XML ::简单(我个人最喜欢的),或者一些其他的XML解析模块。

I don't know whether or not there's an XML parser built for awk, but I'm afraid that if you want to parse XML with awk you're going to get a lot of "hammers are for nails, screwdrivers are for screws" answers. I'm sure it can be done, but it's probably going to be easier for you to write something quick in Perl that uses XML::Simple (my personal favorite) or some other XML parsing module.

只是为了保持完整性,我想指出,如果你的代码段是整个文件的例子,它是不是有效的XML。有效的XML应该有开始和结束标记,像这样:

Just for completeness, I'd like to note that if your snippet is an example of the entire file, it is not valid XML. Valid XML should have start and end tags, like so:

<netlist>
  <net NetName="abc" attr1="123" attr2="234" attr3="345".../>
  <net NetName="cde" attr1="456" attr2="567" attr3="678".../>
  ....
</netlist>

我敢肯定无效的XML有它的用途,但一些XML解析器会抱怨它,所以除非你使用一个awk单行尝试半屁股是死心塌地的解析你的XML你可能要考虑你的XML有效。

I'm sure invalid XML has its uses, but some XML parsers may whine about it, so unless you're dead set on using an awk one-liner to try to half-ass "parse" your "XML," you may want to consider making your XML valid.

在回答您的编辑,我还是不会做它作为一个班轮,但这里有一个Perl脚本,你可以使用:

In response to your edits, I still won't do it as a one-liner, but here's a Perl script that you can use:

#!/usr/bin/perl

use strict;
use warnings;
use XML::Simple;

sub usage {
  die "Usage: $0 [NetName] ([attr])\n";
}

my $file = XMLin("file.xml", KeyAttr => { net => 'NetName' });

usage() if @ARGV == 0;

exists $file->{net}{$ARGV[0]}
  or die "$ARGV[0] does not exist.\n";


if(@ARGV == 2) {
  exists $file->{net}{$ARGV[0]}{$ARGV[1]}
    or die "NetName $ARGV[0] does not have attribute $ARGV[1].\n";
  print "$file->{net}{$ARGV[0]}{$ARGV[1]}.\n";

} elsif(@ARGV == 1) {
  print "$ARGV[0]:\n";
  print "  $_ = $file->{net}{$ARGV[0]}{$_}\n"
    for keys %{ $file->{net}{$ARGV[0]} };

} else {
  usage();
}

1或2个参数运行命令行该脚本。第一个参数是网络名要查找,第二个是你要查找的属性。如果没有属性给出,它应该只是列出该网络名

Run this script from the command line with 1 or 2 arguments. The first argument is the 'NetName' you want to look up, and the second is the attribute you want to look up. If no attribute is given, it should just list all the attributes for that 'NetName'.

这篇关于我怎样才能挖掘使用awk,Perl中,或Python的XML文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆