如何使用PHP跳过XML文件中的无效字符 [英] How to skip invalid characters in XML file using PHP
问题描述
我正在尝试使用PHP解析XML文件,但收到错误消息:
I'm trying to parse an XML file using PHP, but I get an error message:
解析器错误:字符0x0超出允许的范围
parser error : Char 0x0 out of allowed range in
我认为是因为XML的内容,我认为有一个特殊的符号☆",我有什么办法可以解决它?
I think it's because of the content of the XML, I think there is a speical symbol "☆", any ideas what I can do to fix it?
我也得到:
解析器错误:标签项目行中的数据过早结束
parser error : Premature end of data in tag item line
什么可能导致该错误?
我正在使用 simplexml_load_file
.
I'm using simplexml_load_file
.
我尝试找到错误行并将其内容粘贴为单个xml文件,并且可以正常工作!所以我仍然无法弄清楚是什么使xml文件解析失败. PS这是一个超过100M的巨大xml文件,会引起解析错误吗?
I try to find the error line and paste its content as single xml file and it can work!! so I still cannot figure out what makes xml file parse fails. PS it's a huge xml file over 100M, will it makes parse error?
推荐答案
您可以控制XML吗?如果是这样,请确保将数据包含在<![CDATA[
.. ]]>
块中.
Do you have control over the XML? If so, ensure the data is enclosed in <![CDATA[
.. ]]>
blocks.
您还需要清除无效字符:
And you also need to clear the invalid characters:
/**
* Removes invalid XML
*
* @access public
* @param string $value
* @return string
*/
function stripInvalidXml($value)
{
$ret = "";
$current;
if (empty($value))
{
return $ret;
}
$length = strlen($value);
for ($i=0; $i < $length; $i++)
{
$current = ord($value{$i});
if (($current == 0x9) ||
($current == 0xA) ||
($current == 0xD) ||
(($current >= 0x20) && ($current <= 0xD7FF)) ||
(($current >= 0xE000) && ($current <= 0xFFFD)) ||
(($current >= 0x10000) && ($current <= 0x10FFFF)))
{
$ret .= chr($current);
}
else
{
$ret .= " ";
}
}
return $ret;
}
这篇关于如何使用PHP跳过XML文件中的无效字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!