非XML文档的XML解析 [英] XML Parsing from Non-XML Document

查看：163 发布时间：2017/5/16 19:52:59 php xml design-patterns xml-parsing

本文介绍了非XML文档的XML解析的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在xml /非xml文件中可能存在一些XML Block，我需要解析并替换一些其他字符串。场景是这样的..

in a xml/non-xml File there may exist some XML Block that I need to parse and replace with some other string.. The Scenario is something like this..

Some Text
<cnt:use name="abc" call="xyz">
   <cnt:param name="x" value="2" />
</cnt:use>
Some Text

不能保证文档是一个正确的XML文档。（可能存在一些未封闭的标签或愚蠢的人在打字HTML时可能会犯的一些其他常见错误）。所以我不能使用SAX或DOM。我甚至不能将它传递给XSLT（我是对的）那么什么是从非xml文档中提取< cnt：*> 部分的最佳方法。然后阅读，然后用其他东西替换。

There is no guarantee that the document is a proper XML document. (there may exist some unclosed Tags. or some other common mistakes that a Stupid people can make while typing HTML). so I can't use SAX or DOM. I can't even pass it to XSLT (am I right ?). So Whats the best way to extract the <cnt:*> part from the non-xml Document. and read it then replace with something else.

推荐答案

嗯。问题是我要在PHP中实现它:(超级悲伤..
所以从Mads Hansen的答案中提到的 TagSoup 中的想法在PHP 5.3上创建了一个Mini SAX框架。 https://github.com/neel/ SuSAX / blob / master / sax.php 。

Hmm. The Problem is I've to implementing it in PHP :( . Super Sad.. So taking ideas from TagSoup as mentioned in Mads Hansen's Answer. I've made a Mini SAX Framework on PHP 5.3. https://github.com/neel/SuSAX/blob/master/sax.php.

我保持更像SAX，同时我也跟踪标签嵌套，还保留了一个解析树，我保留了一个 setNsFocus（）方法，只指定要跟随的标签。

I am keeping it more like SAX. at the same time I am tracking the tag nesting also. and also keeping a Parse Tree. I've kept a setNsFocus() method that Specifies only which tags to follow.

<?php
error_reporting(255);
ini_set('display_errors','On');
header('Content-Type: text/plain');
class MyParser extends \SuSAX\AbstractParser{
    public function open($tag){
        echo ">> open ".$tag->ns().':'.$tag->name().'/'.$this->indentation().($this->parent() ? $this->parent()->name() : '')."\n";
        return "OO";
    }
    public function close($tag){
        echo ">> close ".$tag->ns().':'.$tag->name().'/'.$this->indentation()."\n";
    }
    public function standalone($tag){
        echo ">> standalone ".$tag->ns().':'.$tag->name().'/'.$this->indentation()."\n";
    }
    }
$text = <<<TEXT
Hallo <b>W<html:i>o</html:i>rld</b>
<cnt:tag x="2" y="1">
<cnt:taga x="2" y="1"></cnt:taga>
</cnt:tag>
I am Here
TEXT;
$parser = new \SuSAX\Parser(new MyParser);
$parser->setNsFocus('cnt');
$parser->setText($text);
$text_ = $parser->parse();
var_dump($text_);
?>

这篇关于非XML文档的XML解析的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

非XML文档的XML解析 [英] XML Parsing from Non-XML Document

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

非XML文档的XML解析 [英] XML Parsing from Non-XML Document

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭