XML分析器错误:实体未定义 [英] XML parser error: entity not defined

查看:352
本文介绍了XML分析器错误:实体未定义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在这个问题上搜索了stackoverflow,并找到了一些主题,但是我觉得在这个问题上并没有一个真正的答案.

我有一个用户提交的表单,该字段的值存储在XML文件中. XML设置为使用UTF-8编码.

每隔一段时间,用户就会从某个地方复制/粘贴文本,这就是当我收到实体未定义的错误"时.

我认识到XML仅支持少数几个实体,除此之外的任何东西都无法识别-因此会出现解析器错误.

从我的收集中,我看到了一些选择:

  1. 我可以找到并替换所有 ,然后将其替换为 或实际空间.
  2. 我可以将有问题的代码放在CDATA部分中.
  3. 我可以将这些实体包含在XML文件中.

我对XML文件所做的工作是,用户可以将内容输入表单,将其存储在XML文件中,然后将该内容显示为XHTML在Web页上(用SimpleXML解析). /p>

在这三个选项或我不知道的任何其他选项中,处理这些实体的最佳方法是什么?

谢谢, 瑞安

更新

我要感谢大家的宝贵反馈.我实际上确定了导致我的实体错误的原因.所有的建议使我更加深入地研究它!

一些文本框,其中普通的旧文本框,但我的文本区域已通过TinyMCE进行了增强.事实证明,在仔细研究的同时,PHP警告始终引用TinyMCE增强型文本区域中的数据.后来我在PC上注意到所有字符都被取出(因为它无法读取它们),但是在MAC上,您会看到一些小方形框,引用了该字符的unicode号.首先将其显示在MAC上的正方形中的原因是因为我使用utf8_encode对非UTF格式的数据进行编码,以防止发生其他解析错误(这在某种程度上也与TinyMCE有关).

所有这些的解决方案非常简单:

我在我的tinyMCE.init中添加了这行entity_encoding : "utf-8".现在,所有角色都按照预期的方式显示.

我想我唯一不了解的是为什么将这些字符放在文本框中时仍会显示出来,因为没有任何东西可以将它们转换为UTF,但是对于TinyMCE来说,这是个问题.

解决方案

我同意这纯粹是一个编码问题.在PHP中,这就是我解决此问题的方法:

  1. 在将html片段传递给SimpleXMLElement构造函数之前,我使用html_entity_decode对其进行了解码.

  2. 然后进一步使用utf8_encode()对其进行编码.

 $headerDoc = '<temp>' . utf8_encode(html_entity_decode($headerFragment)) . '</temp>'; 
$xmlHeader = new SimpleXMLElement($headerDoc);
 

现在,以上代码不会引发任何未定义实体错误.

I have searched stackoverflow on this problem and did find a few topics, but I feel like there isn't really a solid answer for me on this.

I have a form that users submit and the field's value is stored in a XML file. The XML is set to be encoded with UTF-8.

Every now and then a user will copy/paste text from somewhere and that's when I get the "entity not defined error".

I realize XML only supports a select few entities and anything beyond that is not recognized - hence the parser error.

From what I gather, there's a few options I've seen:

  1. I can find and replace all &nbsp; and swap them out with &#32; or an actual space.
  2. I can place the code in question within a CDATA section.
  3. I can include these entities within the XML file.

What I'm doing with the XML file is that the user can enter content into a form, it gets stored in a XML file, and that content then gets displayed as XHTML on a Web page (parsed with SimpleXML).

Of the three options, or any other option(s) I'm not aware of, what's really the best way to deal with these entities?

Thanks, Ryan

UPDATE

I want to thank everyone for the great feedback. I actually determined what caused my entity errors. All the suggestions made me look into it more deeply!

Some textboxes where plain old textboxes, but my textareas were enhanced with TinyMCE. It turns out, while taking a closer look, that the PHP warnings always referenced data from the TinyMCE enhanced textareas. Later I noticed on a PC that all the characters were taken out (because it couldn't read them), but on a MAC you could see little square boxes referencing the unicode number of that character. The reason it showed up in squares on a MAC in the first place, is because I used utf8_encode to encode data that wasn't in UTF to prevent other parsing errors (which is somehow also related to TinyMCE).

The solution to all this was quite simple:

I added this line entity_encoding : "utf-8" in my tinyMCE.init. Now, all the characters show up the way they are supposed to.

I guess the only thing I don't understand is why the characters still show up when placed in textboxes, because nothing converts them to UTF, but with TinyMCE it was a problem.

解决方案

I agree that it is purely an encoding issue. In PHP, this is how I solved this problem:

  1. Before passing the html-fragment to SimpleXMLElement constructor I decoded it by using html_entity_decode.

  2. Then further encoded it using utf8_encode().

$headerDoc = '<temp>' . utf8_encode(html_entity_decode($headerFragment)) . '</temp>'; 
$xmlHeader = new SimpleXMLElement($headerDoc);

Now the above code does not throw any undefined entity errors.

这篇关于XML分析器错误:实体未定义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆