SimpleXML XML 解析 [英] SimpleXML XML Parsing

查看:28
本文介绍了SimpleXML XML 解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个脚本,它从 URL 获取 XML 并更新 mysql 数据库并将数据解析为 csv 文件.

I have created a script that take XML from URL and updates mysql database and parses data to csv file.

我得到了 XML 格式的 HTML 字符串,它们不应该在那里.如何在解析时删除它们?

I get HTML strings in XML and they should not be there. How to remove them while parsing?

我像这样加载 XML 文件:

I load XML file like this:

$xml = simplexml_load_file(utf8_encode($xml_url), 'SimpleXMLElement', LIBXML_NOCDATA);

运行脚本时出现的错误:

Error that I get when running the script:

Warning: simplexml_load_file() [function.simplexml-load-file]: http://domain.com/api/get_catalog.php?id=351&user=878&key=b8:1: parser error : Space required after the Public Identifier in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59

Warning: simplexml_load_file() [function.simplexml-load-file]: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59

Warning: simplexml_load_file() [function.simplexml-load-file]: ^ in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59

Warning: simplexml_load_file() [function.simplexml-load-file]: http://domain.com/api/get_catalog.php?id=351&user=878&key=b8:1: parser error : SystemLiteral " or ' expected in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59

Warning: simplexml_load_file() [function.simplexml-load-file]: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59

Warning: simplexml_load_file() [function.simplexml-load-file]: ^ in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59

Warning: simplexml_load_file() [function.simplexml-load-file]: http://domain.com/api/get_catalog.php?id=351&user=878&key=b8:1: parser error : SYSTEM or PUBLIC, the URI is missing in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59

Warning: simplexml_load_file() [function.simplexml-load-file]: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59

Warning: simplexml_load_file() [function.simplexml-load-file]: ^ in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59
xml $ not loaded.

当我使用 Firefox 并将 XML 从 url 保存到磁盘时,我在尝试从 url 获取它时解析它没有问题.

When I use a Firefox and save XML from url to disk I have no problem parsing it just when I try to get it from url.

XML 看起来不错:XML 的一部分:

XML looks fine: Part of XML:

<?xml version="1.0" encoding="UTF-8"?>
<RecroKatalog>
<viewCustomerDiscount>
    <BrojArtikla>10214</BrojArtikla>
    <Naziv>Eksterno kucište 2.5&quot; S-ATA+IDE HDD, Aluminium, USB 2.0</Naziv>
    <NetoPrice>81.8224</NetoPrice>
    <Status>Dostupno</Status>
    <Opis></Opis>
    <dugi_opis>Isporucuje se u SIVOJ boji</dugi_opis>
    <Image>http://shop.lost.hr/data/images/big/10.jpg</Image>
    <WEB_Grupa>Ladice i eksterna kucišta - OSTALO</WEB_Grupa>
    <Akcija>0</Akcija>
    <Proizvodjac></Proizvodjac>
    <Klasifikacija>PH-25SD-B/VK220</Klasifikacija>
</viewCustomerDiscount>

推荐答案

错误消息中有一些巨大的线索.它抱怨看到:

There are some HUGE clues in the error messages. It is complaining about seeing:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">

它是该网站提供的 HTML 文档的开始……而不是您正在寻找的 XML.

It is the start of a HTML document being provided by that website… not the XML you're looking for.

这通常发生在您必须对远程服务进行身份验证时(因此在您登录时在浏览器中工作),但您没有告诉 SimpleXML 为您执行此操作.

This usually happens when you have to authenticate against the remote service (hence working in your browser, as you logged in), but you're not telling SimpleXML to do that for you.

这篇关于SimpleXML XML 解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆