从 XML 结束标记填充数组 [英] Populate array from XML end tags

查看:47
本文介绍了从 XML 结束标记填充数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个字段名称数组,以便稍后在我的脚本中使用.正则表达式正在踢我的屁股.好久没写代码了.字段名称嵌入在 XML 标记中,因此我认为可以从第一行数据的结束标记中提取它们.我看不到正确填充数组.....有人可以为我提供一些帮助吗?

I am trying to create an array of field names that I can use later in my script. Regular expressions are kicking my butt. I haven't written code in a long time. The field names are embedded in the XML tags so I figured I could extract them from the ending tag of my first row of data. I can't see to populate the array properly.....can anyone shed some light for me?

my $firstLineOfXMLFile = <record>DEFECT000179<\record><state>Approved<\state><title>Something is broken<\title>

my @fieldNames = $firstLineOfXMLFile =~ m(<\(.*)>)g; #problem, can't seem to grab the text within the end tags.

print @fieldNames;

非常感谢!-马特

推荐答案

您的示例数据不是 XML.你的斜线是向后的.假设它您要解析的 XML,答案是不要使用正则表达式".

Your sample data isn't XML. Your slashes are backwards. Assuming it is XML you're trying to parse, the answer is 'don't use regular expressions'.

他们根本无法应对递归和嵌套到必要的程度.

They're simply not able to cope with the recursion and nesting to the degree necessary.

所以考虑到这一点 - 假设您的示例数据实际上是格式良好的 XML 并且这是一个错字,像 XML::Twig 这样的东西会很容易做到:

So with that in mind - assuming your sample data is actually well formed XML and that is a typo, something like XML::Twig will do it quite handily:

#!/usr/bin/env perl
use strict;
use warnings;

use XML::Twig;

my $twig = XML::Twig -> parse ( \*DATA );

#extract a single field value
print $twig -> root -> first_child_text('title'),"\n";
#get a field name
print $twig -> root -> first_child -> tag,"\n";
#can also use att() if you have attributes


print "Field names:\n";
#children() returns all the children of the current (in this case root) node
#We use map to access all, and tag to read their 'name'. 
#att or trimmed_text would do other parts of the XML. 
print join ( "\n", map { $_ -> tag } $twig -> root -> children );

__DATA__
<XML>
<record>DEFECT000179</record><state>Approved</state><title>Something is broken</title>
</XML>

打印:

Something is broken
record
Field names:
record
state
title

您还有各种其他非常有用的工具,例如用于格式化输出 XML 的 pretty_print,让您在解析时操作 XML(特别是方便的)的 twig_handlerspurge)、cutpaste 来移动节点,get_xpath 让你使用 xpath 表达式来查找基于路径和属性的元素.

You also have a variety of other really useful tools, such as pretty_print for formatting your output XML, twig_handlers that let you manipulate XML as you parse (particularly handy for purge), cut and paste to move nodes around, and get_xpath to let you use an xpath expression to find elements based on path and attributes.

根据评论,如果您真的想从中提取数据:

Based on comments, if you really want to extract data from:

</something>

你的东西出错的地方在于 .* 是贪婪的.您要么需要使用否定匹配 - 如:

The thing that's going wrong in your thingy is that .* is greedy. You either need to use a negated match - like:

m,</[^>]>,g 

或者非贪婪匹配:

m,</(.*?)>,g

哦,给你一个反斜杠 - 你需要转义它:

Oh, and given you've a backslash - you need to escape it:

my $firstLineOfXMLFile = '<record>DEFECT000179<\record><state>Approved<\state><title>Something is broken<\title>';
my @fieldNames = $firstLineOfXMLFile =~ m(<\\(.*?)>)g;
print @fieldNames;

会解决问题.(但说真的 - 故意创建看起来像 XML 的东西并不是一件非常糟糕的事情)

Will do the trick. (but seriously - deliberately creating something that looks like XML that isn't is a really bad thing to do)

这篇关于从 XML 结束标记填充数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆