非XML文档的XML解析 [英] XML Parsing from Non-XML Document

查看:163
本文介绍了非XML文档的XML解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在xml /非xml文件中可能存在一些XML Block,我需要解析并替换一些其他字符串。场景是这样的..

in a xml/non-xml File there may exist some XML Block that I need to parse and replace with some other string.. The Scenario is something like this..

Some Text
<cnt:use name="abc" call="xyz">
   <cnt:param name="x" value="2" />
</cnt:use>
Some Text

不能保证文档是一个正确的XML文档。 (可能存在一些未封闭的标签或愚蠢的人在打字HTML时可能会犯的一些其他常见错误)。所以我不能使用SAX或DOM。我甚至不能将它传递给XSLT(我是对的)那么什么是从非xml文档中提取< cnt:*> 部分的最佳方法。然后阅读,然后用其他东西替换。

There is no guarantee that the document is a proper XML document. (there may exist some unclosed Tags. or some other common mistakes that a Stupid people can make while typing HTML). so I can't use SAX or DOM. I can't even pass it to XSLT (am I right ?). So Whats the best way to extract the <cnt:*> part from the non-xml Document. and read it then replace with something else.

推荐答案

嗯。问题是我要在PHP中实现它:(超级悲伤..
所以从Mads Hansen的答案中提到的 TagSoup 中的想法在PHP 5.3上创建了一个Mini SAX框架。 https://github.com/neel/ SuSAX / blob / master / sax.php

Hmm. The Problem is I've to implementing it in PHP :( . Super Sad.. So taking ideas from TagSoup as mentioned in Mads Hansen's Answer. I've made a Mini SAX Framework on PHP 5.3. https://github.com/neel/SuSAX/blob/master/sax.php.

我保持更像SAX,同时我也跟踪标签嵌套,还保留了一个解析树,我保留了一个 setNsFocus()方法,只指定要跟随的标签。

I am keeping it more like SAX. at the same time I am tracking the tag nesting also. and also keeping a Parse Tree. I've kept a setNsFocus() method that Specifies only which tags to follow.

<?php
error_reporting(255);
ini_set('display_errors','On');
header('Content-Type: text/plain');
class MyParser extends \SuSAX\AbstractParser{
    public function open($tag){
        echo ">> open ".$tag->ns().':'.$tag->name().'/'.$this->indentation().($this->parent() ? $this->parent()->name() : '')."\n";
        return "OO";
    }
    public function close($tag){
        echo ">> close ".$tag->ns().':'.$tag->name().'/'.$this->indentation()."\n";
    }
    public function standalone($tag){
        echo ">> standalone ".$tag->ns().':'.$tag->name().'/'.$this->indentation()."\n";
    }
    }
$text = <<<TEXT
Hallo <b>W<html:i>o</html:i>rld</b>
<cnt:tag x="2" y="1">
<cnt:taga x="2" y="1"></cnt:taga>
</cnt:tag>
I am Here
TEXT;
$parser = new \SuSAX\Parser(new MyParser);
$parser->setNsFocus('cnt');
$parser->setText($text);
$text_ = $parser->parse();
var_dump($text_);
?>

这篇关于非XML文档的XML解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆