如何反序列化 xml 字符串以及 NCR 转义? [英] How to deserialize a xml string along with NCR unescaping?

查看:35
本文介绍了如何反序列化 xml 字符串以及 NCR 转义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个序列化的 XML 字符串,我想将其转换为 XML 对象.但是他的字符串包含很少的数字字符引用,例如 ¥.我使用 simplexml_load_string 进行反序列化,但它并没有取消这些人物.

I have a serialized XML string which I want to convert into XML object. But his string contains few Numeric character reference like ¥. I used simplexml_load_string for deserialization but it doesn't unescape these characters.

如果我使用 html_entity_decode 取消转义,<字符串中存在的 URL 的查询参数中的 code>&amp; 也未转义,从而使 XML 解析器的 URL 无效.例如,https://testURL.com?param1=a&amp;param2=b 被转换为 https://testURL.com?param1=a&param2=b 和现在 &param2 是 XML 解析器的无效字符.

And if I unescape using html_entity_decode, the &amp; in query parameters of URLs present in string also gets unescaped that invalidates the URL for XML parser. For example, https://testURL.com?param1=a&amp;param2=b gets converted to https://testURL.com?param1=a&param2=b and now &param2 is an invalid character for XML parser.

一种天真的方法是在发送到 simplexml_load_string 之前用 &# 替换所有 &# 但这可能会破坏一些事情.请让我知道这样做的更好方法.

One naive way would be to replace all &amp;# with &# before sending to simplexml_load_string but that might break few things. Please let me know the better way of doing the same.

推荐答案

听起来你所拥有的是被双重转义的内容;您需要取消选择处理它的顺序,并反转这些步骤,以相反的顺序以取回原始文本.

It sounds like what you have is content that has been double-escaped; you need to unpick the order it was processed in, and reverse those steps, in reverse order to get back the original text.

例如,如果您拥有的 XML 如下所示:

For instance, if the XML you have looks like this:

<thing url="https://testURL.com?param1=a&amp;param2=b" description="blah &amp;#xA5; blah" />

那么原始转换很可能是:

Then its likely that the original transforms were:

  1. 手动转义描述,将¥改为&#xA5;;保持网址不变
  2. 添加urldescription作为XML属性,将&转义为&amp;
  1. Escape description manually, changing ¥ to &#xA5;; leave URL unchanged
  2. Add url and description as XML attributes, escaping & to &amp;

所以要反转,你需要:

  1. 反向步骤 2:提取 urldescription 属性(使用 SimpleXML)
  2. 反转步骤 1:取消转义 description 值,但保持 url 值不变
  1. Reverse step 2: Extract url and description attributes (using SimpleXML)
  2. Reverse step 1: Unescape the description value, but leave the url value unchanged

给你:

// Step 1; reverses the original step 2
$sx = simplexml_load_string($xml);
$url = (string)$xml['url'];
$description = (string)$xml['description'];

// Step 2; reverses the original step 1
$description = html_entity_decode( $description );

这篇关于如何反序列化 xml 字符串以及 NCR 转义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆