如何反序列化 xml 字符串以及 NCR 转义? [英] How to deserialize a xml string along with NCR unescaping?
问题描述
我有一个序列化的 XML 字符串,我想将其转换为 XML 对象.但是他的字符串包含很少的数字字符引用,例如 ¥
.我使用 simplexml_load_string 进行反序列化,但它并没有取消这些人物.
I have a serialized XML string which I want to convert into XML object. But his string contains few Numeric character reference like ¥
. I used simplexml_load_string for deserialization but it doesn't unescape these characters.
如果我使用 html_entity_decode 取消转义,<字符串中存在的 URL 的查询参数中的 code>& 也未转义,从而使 XML 解析器的 URL 无效.例如,https://testURL.com?param1=a&param2=b
被转换为 https://testURL.com?param1=a¶m2=b
和现在 ¶m2
是 XML 解析器的无效字符.
And if I unescape using html_entity_decode, the &
in query parameters of URLs present in string also gets unescaped that invalidates the URL for XML parser. For example,
https://testURL.com?param1=a&param2=b
gets converted to https://testURL.com?param1=a¶m2=b
and now ¶m2
is an invalid character for XML parser.
一种天真的方法是在发送到 simplexml_load_string 之前用 &#
替换所有 &#
但这可能会破坏一些事情.请让我知道这样做的更好方法.
One naive way would be to replace all &#
with &#
before sending to simplexml_load_string but that might break few things. Please let me know the better way of doing the same.
推荐答案
听起来你所拥有的是被双重转义的内容;您需要取消选择处理它的顺序,并反转这些步骤,以相反的顺序以取回原始文本.
It sounds like what you have is content that has been double-escaped; you need to unpick the order it was processed in, and reverse those steps, in reverse order to get back the original text.
例如,如果您拥有的 XML 如下所示:
For instance, if the XML you have looks like this:
<thing url="https://testURL.com?param1=a&param2=b" description="blah &#xA5; blah" />
那么原始转换很可能是:
Then its likely that the original transforms were:
- 手动转义描述,将
¥
改为¥
;保持网址不变 - 添加
url
和description
作为XML属性,将&
转义为&
- Escape description manually, changing
¥
to¥
; leave URL unchanged - Add
url
anddescription
as XML attributes, escaping&
to&
所以要反转,你需要:
- 反向步骤 2:提取
url
和description
属性(使用 SimpleXML) - 反转步骤 1:取消转义
description
值,但保持url
值不变
- Reverse step 2: Extract
url
anddescription
attributes (using SimpleXML) - Reverse step 1: Unescape the
description
value, but leave theurl
value unchanged
给你:
// Step 1; reverses the original step 2
$sx = simplexml_load_string($xml);
$url = (string)$xml['url'];
$description = (string)$xml['description'];
// Step 2; reverses the original step 1
$description = html_entity_decode( $description );
这篇关于如何反序列化 xml 字符串以及 NCR 转义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!