PHP-处理无效的XML [英] PHP - Processing Invalid XML

查看:100
本文介绍了PHP-处理无效的XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用SimpleXML来加载一些xml文件(我没有编写/提供并且不能真正改变其格式).

I'm using SimpleXML to load in some xml files (which I didn't write/provide and can't really change the format of).

有时(例如,每50个左右中有一个或两个文件)它们不会转义任何特殊字符(主要是& ;,但有时还会出现其他随机无效的东西).这会产生问题,因为使用PHP的SimpleXML只会失败,而且我真的不知道有什么好的方法可以处理无效的XML.

Occasionally (eg one or two files out of every 50 or so) they don't escape any special characters (mostly &, but sometimes other random invalid things too). This creates and issue because SimpleXML with php just fails, and I don't really know of any good way to handle parsing invalid XML.

我的第一个想法是将XML作为字符串进行预处理,然后将所有字段都作为CDATA放置,这样它就可以工作,但是出于某种不敬虔的原因,我需要处理的XML将其所有数据都放在属性字段中.因此,我不能使用CDATA的想法. XML的示例为:

My first idea was to preprocess the XML as a string and put ALL fields in as CDATA so it would work, but for some ungodly reason the XML I need to process puts all of its data in the attribute fields. Thus I can't use the CDATA idea. An example of the XML being:

 <Author v="By Someone & Someone" />

在用SimpleXML加载XML之前,最好的处理方式是什么来替换XML中的所有无效字符?

Whats the best way to process this to replace all the invalid characters from the XML before I load it in with SimpleXML?

推荐答案

您需要的是一种将使用libxml的内部错误来定位无效字符并相应地对其进行转义的东西.这是我将如何编写的样机.查看libxml_get_errors()的错误信息结果.

What you need is something that will use libxml's internal errors to locate invalid characters and escape them accordingly. Here's a mockup of how I'd write it. Take a look at the result of libxml_get_errors() for error info.

function load_invalid_xml($xml)
{
    $use_internal_errors = libxml_use_internal_errors(true);
    libxml_clear_errors(true);

    $sxe = simplexml_load_string($xml);

    if ($sxe)
    {
        return $sxe;
    }

    $fixed_xml = '';
    $last_pos  = 0;

    foreach (libxml_get_errors() as $error)
    {
        // $pos is the position of the faulty character,
        // you have to compute it yourself
        $pos = compute_position($error->line, $error->column);
        $fixed_xml .= substr($xml, $last_pos, $pos - $last_pos) . htmlspecialchars($xml[$pos]);
        $last_pos = $pos + 1;
    }
    $fixed_xml .= substr($xml, $last_pos);

    libxml_use_internal_errors($use_internal_errors);

    return simplexml_load_string($fixed_xml);
}

这篇关于PHP-处理无效的XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆