使用 PHP 从 SOAP 响应中解析 CDATA [英] Parse CDATA from a SOAP Response with PHP
问题描述
我正在尝试使用 SimpleXML 和 Xpath 从 SOAP 响应中解析出 CDATA.我得到了我正在寻找的输出,但返回的输出是一行连续的数据,没有分隔符可以让我解析.
感谢您的帮助!
这是包含我需要解析的 CDATA 的 SOAP 响应:
<soapenv:Body><ns1:getIPServiceDataResponse xmlns:ns1="http://ws.icontent.idefense.com/V3/2"><ns1:return xsi:type="ns1:IPServiceDataResponse" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><ns1:status>成功</ns1:status><ns1:serviceType>IPservice_TIIncremental_ALL_xml_v1</ns1:serviceType><ns1:ipserviceData><![CDATA[<?xml version="1.0" encoding="utf-8"?><threat_indicators><tidata><indicator>URL</indicator><格式>STRING</format><value>http://update.lflink.com/aspnet_vil/debug.swf</value><role>EXPLOIT</role><sample_md5/><last_observed>2012-11-02 18:13:43.587000</last_observed><comment>APT Blade2009 - CVE-2012-5271</comment><ref_id/></tidata><URLdicator;/indicator><format>STRING</format><value>http://update.lflink.com/crossdomain.xml</value><role>EXPLOIT</role><sample_md5/>;last_observed>2012-11-02 18:14:04.108000</last_observed><comment>APT Blade2009 - CVE-2012-5271</comment><ref_id/><ref_id/>;indicator>DOMAIN</indicator><format>STRING</format><value>update.lflink.com</value><role>EXPLOIT</role><sample_md5/><last_observed>2012-11-02 18:15:10.445000</last_observed><comment>APT Blade2009-CVE-2012-5271<ref/gt;;/tidata></threat_indicators>]]></ns1:ipserviceData></ns1:return></ns1:getIPServiceDataResponse></soapenv:Body></soapenv:信封>
这是我用来尝试解析 CDATA 的 PHP 代码:
registerXPathNamespace('ns1', 'http://ws.icontent.idefense.com/V3/2');foreach ($xml->xpath("//ns1:ipserviceData") as $item){echo '';print_r($item);echo '</pre>';}?>
这是 print_r 输出:
SimpleXMLElement 对象([0] =>URLSTRINGhttp://update.lflink.com/aspnet_vil/debug.swfEXPLOIT2012-11-02 18:13:43.587000APT Blade2009 - CVE-2012-5271URLSTRINGhttp://update.lflink.com/crossdomain.xmlEXPLOIT2012-1814:04.108000APT Blade2009 - CVE-2012-5271DOMAINSTRINGupdate.lflink.comEXPLOIT2012-11-02 18:15:10.445000APT Blade2009 - CVE-2012-5271)
有什么想法可以使输出可用吗?比如解析出CDATA输出的每个元素如:
等等
仅供参考 - 还尝试使用 LIBXML_NOCDATA,但输出没有变化.
你得到它作为一个单一的字符串,因为你要求它 - 只是字符串.
如果您希望能够将该字符串解析为 XML,那么可以从中创建一个新的 Simplexml 对象.
然后你有另一个可以解析 HTML 的字符串解析器(是的,就是这么简单;Demo):
$soap = simplexml_load_string($soapXML);$soap->registerXPathNamespace('ns1', 'http://ws.icontent.idefense.com/V3/2');$ipserviceData = simplexml_load_string($soap->xpath('//ns1:ipserviceData')[0]);//<threat_indicators><tidata><indicator>URL</indicator>echo $ipserviceData->tidata-> 指示器,\n";# 网址
顺便说一句,LIBXML_NOCDATA
标志Docs 仅控制 部分是保留为 CDATA 节点还是合并为文本节点.
I'm trying to parse out the CDATA from a SOAP response using SimpleXML and Xpath. I get the output that I looking for but the output returned is one continuous line of data with no separators that would allow me to parse.
I appreciate any help!
Here is the SOAP response containing the CDATA that I need to parse:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Body>
<ns1:getIPServiceDataResponse xmlns:ns1="http://ws.icontent.idefense.com/V3/2">
<ns1:return xsi:type="ns1:IPServiceDataResponse" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ns1:status>Success</ns1:status>
<ns1:serviceType>IPservice_TIIncremental_ALL_xml_v1</ns1:serviceType>
<ns1:ipserviceData><![CDATA[<?xml version="1.0" encoding="utf-8"?><threat_indicators><tidata><indicator>URL</indicator><format>STRING</format><value>http://update.lflink.com/aspnet_vil/debug.swf</value><role>EXPLOIT</role><sample_md5/><last_observed>2012-11-02 18:13:43.587000</last_observed><comment>APT Blade2009 - CVE-2012-5271</comment><ref_id/></tidata><tidata><indicator>URL</indicator><format>STRING</format><value>http://update.lflink.com/crossdomain.xml</value><role>EXPLOIT</role><sample_md5/><last_observed>2012-11-02 18:14:04.108000</last_observed><comment>APT Blade2009 - CVE-2012-5271</comment><ref_id/></tidata><tidata><indicator>DOMAIN</indicator><format>STRING</format><value>update.lflink.com</value><role>EXPLOIT</role><sample_md5/><last_observed>2012-11-02 18:15:10.445000</last_observed><comment>APT Blade2009 - CVE-2012-5271</comment><ref_id/></tidata></threat_indicators>]]></ns1:ipserviceData>
</ns1:return>
</ns1:getIPServiceDataResponse>
</soapenv:Body>
</soapenv:Envelope>
Here is PHP code I'm using to try to parse the CDATA:
<?php
$xml = simplexml_load_string($soap_response);
$xml->registerXPathNamespace('ns1', 'http://ws.icontent.idefense.com/V3/2');
foreach ($xml->xpath("//ns1:ipserviceData") as $item)
{
echo '<pre>';
print_r($item);
echo '</pre>';
}
?>
Here's the print_r output:
SimpleXMLElement Object
(
[0] => URLSTRINGhttp://update.lflink.com/aspnet_vil/debug.swfEXPLOIT2012-11-02 18:13:43.587000APT Blade2009 - CVE-2012-5271URLSTRINGhttp://update.lflink.com/crossdomain.xmlEXPLOIT2012-11-02 18:14:04.108000APT Blade2009 - CVE-2012-5271DOMAINSTRINGupdate.lflink.comEXPLOIT2012-11-02 18:15:10.445000APT Blade2009 - CVE-2012-5271
)
Any ideas what I can do to make the output usable? For example, parsing out each element of the CDATA output such as: <indicator></indicator>, <value></value>, <role></role>,
etc.
FYI - Also tried using LIBXML_NOCDATA with no change in output.
You get it as a single string because you have asked for that - just the string.
If you want to be able to parse that string as XML then, well create a new Simplexml object out of it.
Then you have another parser on the string which can parse the HTML (yes that simple; Demo):
$soap = simplexml_load_string($soapXML);
$soap->registerXPathNamespace('ns1', 'http://ws.icontent.idefense.com/V3/2');
$ipserviceData = simplexml_load_string($soap->xpath('//ns1:ipserviceData')[0]);
// <threat_indicators><tidata><indicator>URL</indicator>
echo $ipserviceData->tidata->indicator, "\n"; # URL
Btw, the LIBXML_NOCDATA
flagDocs only controls whether the <![CDATA[...]]>
parts are preserved as CDATA nodes or merged into text-nodes.
这篇关于使用 PHP 从 SOAP 响应中解析 CDATA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!