无法使用getNamespaces()从响应中解析带有冒号(:)的xml数据 [英] Unable to parse xml data with colon (:) from response using getNamespaces()

查看:63
本文介绍了无法使用getNamespaces()从响应中解析带有冒号(:)的xml数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想读取以下xml中<q:content></q:content>标记内的内容-

I want to read whatever is inside the <q:content></q:content> tags in the following xml -

$xml = '<?xml version="1.0"?>
                    <q:response xmlns:q="http://api-url">
                        <q:impression>
                            <q:content>
                                <html>
                                    <head>
                                        <meta name="HandheldFriendly" content="True">
                                        <meta name="viewport" content="width=device-width, user-scalable=no">
                                        <meta http-equiv="cleartype" content="on">
                                    </head>
                                    <body style="margin:0px;padding:0px;">
                                        <iframe scrolling="no" src="http://some-url" width="320px" height="50px" style="border:none;"></iframe>
                                    </body>
                                </html>
                            </q:content>
                            <q:cpc>0.02</q:cpc>
                        </q:impression>
                    ...
                        ... some more things
                    ...
                    </q:response>';

我已将xml放在上面的变量中,然后使用 SimpleXMLElement: :getNamespaces 如示例#1获取正在使用的文档名称空间"-

I have put the xml in the variable above and then I use SimpleXMLElement::getNamespaces as given in the section "Example #1 Get document namespaces in use" -

//code continued
$dom = new DOMDocument;
 // load the XML string defined above
$dom->loadXML($xml);

var_dump($dom->getElementsByTagNameNS('http://api-url', '*') ); // shows object(DOMNodeList)#3 (0) { } 


foreach ($dom->getElementsByTagNameNS('http://api-url', '*') as $element) 
{
    //this does not execute
    echo 'see - local name: ', $element->localName, ', prefix: ', $element->prefix, "\n";
}

但是for循环中的代码无法执行.

But the code inside the for loop does not execute.

我已经阅读了这些问题-

I have read these questions -

如何阅读< abc:xyz>使用php的xml标签?

更新
还尝试了此解决方案使用SimpleXML使用名称空间解析XML -

Update
Also tried this solution Parse XML with Namespace using SimpleXML -

$xml = new SimpleXMLElement($xml);
$xml->registerXPathNamespace('e', 'http://api-url');

foreach($xml->xpath('//e:q') as $event) {
    echo "not coming here";
    $event->registerXPathNamespace('e', 'http://api-url');
    var_export($event->xpath('//e:content'));
}

在这种情况下,foreach内部的代码也不会执行. 不知道我写的东西是否正确...

In this case too, the code inside the foreach does not execute. Not sure if I wrote everything correct ...

进一步更新
使用第一个解决方案... error_reporting = -1,发现问题出在iframe标记的src attr中的URL.收到类似-

Further Update
Going with the first solution ... with error_reporting = -1, found that the problem is with the URL in the src attr of the iframe tag. Getting warnings like -

Warning: DOMDocument::loadXML(): EntityRef: expecting ';' in Entity, line: 13

更新的代码-

$xml = '<?xml version="1.0"?>
                    <q:response xmlns:q="http://api-url">
                        <q:impression>
                            <q:content>
                                <html>
                                    <head>
                                        <meta name="HandheldFriendly" content="True" />
                                        <meta name="viewport" content="width=device-width, user-scalable=no" />
                                        <meta http-equiv="cleartype" content="on" />
                                    </head>
                                    <body style="margin:0px;padding:0px;">
                                        <iframe scrolling="no" src="http://serve.qriously.com/v1/request?type=SERVE&aid=ratingtest&at=2&uid=0000000000000000&noHash=true&testmode=true&ua=Mozilla/5.0 (Linux; U; Android 2.2.1; en-us; Nexus One Build/FRG83) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1&appid=12e2561f048158249e30000012e256826ad&pv=2&rf=2&src=admarvel&type=get&lang=eng" width="320px" height="50px" style="border:none;"></iframe>
                                    </body>
                                </html>
                            </q:content>
                            <q:cpc>0.02</q:cpc>
                        </q:impression>
                        <q:app_stats>
                                <q:total><q:ctr>0.023809523809523808</q:ctr><q:ecpm>0.5952380952380952</q:ecpm></q:total>
                                <q:today><q:ctr>0.043478260869565216</q:ctr><q:ecpm>1.0869565217391306</q:ecpm></q:today>
                        </q:app_stats>
                    </q:response>';

推荐答案

我没有任何问题可以使其正常工作,我唯一能发现的错误是您正在加载其中包含非XML HTML块的XML.正在破坏文档:head部分中的meta元素未关闭.

I have no problem to get it to work, the only error I could find is that you're loading XML containing a non-XML HTML chunk in there which is breaking the document: The meta elements in the head section are not closed.

参见演示.

提示:始终激活错误日志记录和报告,如果要开发和调试代码,请检查警告和注意事项.简短的一行显示 all 各种PHP错误消息,包括. 警告通知严格:

Tip: Always activate error logging and reporting, check for warnings and notices if you develop and debug code. A short one-line displaying all sort of PHP error messages incl. warnings, notices and strict:

error_reporting(-1); ini_set('display_errors', 1);

DOMDocument会引起人们谈论,然后会在加载XML时讨论格式错误的元素.

DOMDocument is talkative then about malformed elements when loading XML.

DomDocument仅接受有效的XML.如果您有HTML,也可以尝试使用DOMDocument::loadHTML()来完成此工作,但是它将随后将加载的字符串转换为X(HT)ML文档.可能不是您要找的东西.

DomDocument accepts only valid XML. If you've got HTML you can alternatively try if DOMDocument::loadHTML() does the job as well, however it will convert the loaded string into a X(HT)ML document then. Probably not what you're looking for.

要转义字符串的特定部分以进行加载以使其与XML兼容,您可以搜索字符串模式以获得代表XML内部HTML的子字符串,并对其进行正确的XML编码.

To escape a specific part of the string to load to make it XML compatible you can search for string patterns to obtain the substring that represents the HTML inside the XML and properly XML encode it.

例如您可以查找<html></html>作为周围的标签,提取整体的子字符串并将其替换为substr_replace().要对HTML进行编码以用作XML内的数据,请使用htmlspecialchars()函数,它将用

E.g. you can look for <html> and </html> as the surrounding tags, extract the substring of the whole and replace it with substr_replace(). To encode the HTML for being used as data inside the XML, use the htmlspecialchars() function, it will replace everything with the five entities in the other SO answer.

一些模拟代码:

$htmlStart = strpos($xml, '<html>');
if (false === $htmlStart) throw new Exception('<html> not found.');
$htmlEnd = strpos($xml, '</html>', $htmlStart);
if (false === $htmlStart) throw new Exception('</html> not found.');
$htmlLen = $htmlEnd - $htmlStart + 7;
$htmlString = substr($xml, $htmlStart, $htmlLen);
$htmlEscaped = htmlspecialchars($htmlString, ENT_QUOTES);
$xml = substr_replace($xml, $htmlEscaped, $htmlStart, $htmlLen);

这篇关于无法使用getNamespaces()从响应中解析带有冒号(:)的xml数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆