解析 XML 中混合文本和元素标记的最简单方法是什么? [英] What's the easiest way to parse mixed text and element markup in XML?
问题描述
我知道在这些方面已经有多个问题,但我找不到任何与我的问题足够接近的问题.我想解析一些看起来像这样的 XML.只有少数元素(也许只有
将具有混合标记,其余的都可以通过 SimpleXML:
I know there are already multiple questions along these lines but I couldn't find anything close enough to my problem. I want to parse some XML that looks something like this. Only a few elements (maybe only <text/>
will have mixed markup, the rest can all be easily parsed with SimpleXML:
<root>
<element>
<text>A <x>b</x> c <y>d</y> e.</text>
</element>
</root>
我已经在大部分结构中使用 SimpleXML,但是,当我到达 <text/>
元素时,我不知道如何单独阅读这些部分(即"A
", "c
" & "e.
" 应该是文本,
& <y/>
应该是元素)并按从左到右的顺序.我所能做的就是获取所有没有标记的文本,或者只获取没有文本的子元素.如果在 SimpleXML 中这是不可能的,我可以使用 DOM 或 <一个 href="http://www.php.net/manual/en/book.xmlreader.php" rel="nofollow">XMLReader?我一直在尝试将 <text/>
元素变成 DOMNodeList(所以在这个例子中我会有一个包含五个节点的列表)但到目前为止我还没有成功.到目前为止我尝试过的是:
I'm already using SimpleXML for most of the structure, however, when I get to the <text/>
element I don't know how to read the parts separately (i.e. "A
", "c
" & "e.
" should be text, <x/>
& <y/>
should be elements) and in left-to-right order. All I can do is get all of the text without the markup or just the child elements without the text. If this is not possible in SimpleXML can I achieve this with DOM or XMLReader? I've been trying to turn the <text/>
element into a DOMNodeList (so in this example I would have a list of five nodes) but I haven't been successful so far. What I've tried so far is:
dom_import_simplexml($xml)->getElementsByTagName('element'); // All <element/> elements
dom_import_simplexml($xml->element)->getElementsByTagName('text'); // Only one element, <text/>
似乎没有一种方法可以返回特定元素的所有子节点(文本和标签)的列表.PHP 中是否还有其他类可以完成我忽略的工作?据我所知,SimpleXML 只能完全解析 XML,其中每个元素仅包含文本、其他元素或为空.
There doesn't seem to be a method that returns a list of all child nodes (both text and tags) of a specific element. Are there any other classes in PHP that could do the job that I have overlooked? As far as I can tell so far SimpleXML can only fully parse XML where each element contains only text, only other elements or is empty.
推荐答案
以下代码使用 XMLReader、XMLReader::read()
和 XMLReader::nodeType
:
The following code does what I want using XMLReader, XMLReader::read()
and XMLReader::nodeType
:
<?php
$refl = new ReflectionClass('XMLReader');
$xml_consts = $refl->getConstants();
$xml = <<<XML
<root>
<element>
<text>A <x>b</x> c <y>d</y> e.</text>
</element>
</root>
XML;
$reader = new XMLReader();
$reader->XML($xml);
// For validation only
$reader->setParserProperty(XMLReader::VALIDATE, true);
if ($reader->isValid()) {
print("No matter what people say, this XML is valid!\n\n");
}
// Prevent warnings about missing DTD
$reader->setParserProperty(XMLReader::VALIDATE, false);
while ($reader->read()) {
$info = ': ';
switch ($reader->nodeType) {
case XMLReader::TEXT:
$info .= "'$reader->value'";
break;
case XMLReader::ELEMENT:
$info .= "<$reader->name>";
break;
case XMLReader::END_ELEMENT:
$info .= "</$reader->name>";
break;
default:
$info = '';
}
print(array_search($reader->nodeType, $xml_consts) . $info . PHP_EOL);
}
?>
它输出:
No matter what people say, this XML is valid!
ELEMENT: <root>
SIGNIFICANT_WHITESPACE
ELEMENT: <element>
SIGNIFICANT_WHITESPACE
ELEMENT: <text>
TEXT: 'A '
ELEMENT: <x>
TEXT: 'b'
END_ELEMENT: </x>
TEXT: ' c '
ELEMENT: <y>
TEXT: 'd'
END_ELEMENT: </y>
TEXT: ' e.'
END_ELEMENT: </text>
SIGNIFICANT_WHITESPACE
END_ELEMENT: </element>
SIGNIFICANT_WHITESPACE
END_ELEMENT: </root>
这篇关于解析 XML 中混合文本和元素标记的最简单方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!