PHP-处理无效的XML [英] PHP - Processing Invalid XML
问题描述
我正在使用SimpleXML来加载一些xml文件(我没有编写/提供并且不能真正改变其格式).
I'm using SimpleXML to load in some xml files (which I didn't write/provide and can't really change the format of).
有时(例如,每50个左右中有一个或两个文件)它们不会转义任何特殊字符(主要是& ;,但有时还会出现其他随机无效的东西).这会产生问题,因为使用PHP的SimpleXML只会失败,而且我真的不知道有什么好的方法可以处理无效的XML.
Occasionally (eg one or two files out of every 50 or so) they don't escape any special characters (mostly &, but sometimes other random invalid things too). This creates and issue because SimpleXML with php just fails, and I don't really know of any good way to handle parsing invalid XML.
我的第一个想法是将XML作为字符串进行预处理,然后将所有字段都作为CDATA放置,这样它就可以工作,但是出于某种不敬虔的原因,我需要处理的XML将其所有数据都放在属性字段中.因此,我不能使用CDATA的想法. XML的示例为:
My first idea was to preprocess the XML as a string and put ALL fields in as CDATA so it would work, but for some ungodly reason the XML I need to process puts all of its data in the attribute fields. Thus I can't use the CDATA idea. An example of the XML being:
<Author v="By Someone & Someone" />
在用SimpleXML加载XML之前,最好的处理方式是什么来替换XML中的所有无效字符?
Whats the best way to process this to replace all the invalid characters from the XML before I load it in with SimpleXML?
推荐答案
您需要的是一种将使用libxml的内部错误来定位无效字符并相应地对其进行转义的东西.这是我将如何编写的样机.查看libxml_get_errors()
的错误信息结果.
What you need is something that will use libxml's internal errors to locate invalid characters and escape them accordingly. Here's a mockup of how I'd write it. Take a look at the result of libxml_get_errors()
for error info.
function load_invalid_xml($xml)
{
$use_internal_errors = libxml_use_internal_errors(true);
libxml_clear_errors(true);
$sxe = simplexml_load_string($xml);
if ($sxe)
{
return $sxe;
}
$fixed_xml = '';
$last_pos = 0;
foreach (libxml_get_errors() as $error)
{
// $pos is the position of the faulty character,
// you have to compute it yourself
$pos = compute_position($error->line, $error->column);
$fixed_xml .= substr($xml, $last_pos, $pos - $last_pos) . htmlspecialchars($xml[$pos]);
$last_pos = $pos + 1;
}
$fixed_xml .= substr($xml, $last_pos);
libxml_use_internal_errors($use_internal_errors);
return simplexml_load_string($fixed_xml);
}
这篇关于PHP-处理无效的XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!