解析包含非法字符的 XML [英] Parsing XML which contains illegal characters

查看:38
本文介绍了解析包含非法字符的 XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从服务器收到的消息包含标签,标签中包含我需要的数据.

A message I receive from a server contains tags and in the tags is the data I need.

我尝试将有效负载解析为 XML,但生成了非法字符异常.

I try to parse the payload as XML but illegal character exceptions are generated.

我也使用了httpUtilitySecurity Utility来转义非法字符,唯一的问题是,它会转义<> 解析 XML 所需的代码.

I also made use of httpUtility and Security Utility to escape the illegal characters, only problem is, it will escape < > which is needed to parse the XML.

我的问题是,当 XML 中包含的数据包含非法的非 XML 字符时,我该如何解析它?(& -> amp;)_

My question is, how do I parse XML when the data contained in it contains illegal non XML characters? (& -> amp;)_

谢谢.

示例:

<item><code>1234</code><title>voi hoody & polo shirt + Mckenzie jumper</title><description>Good condition size small - medium, text me if interested</description></item>

推荐答案

如果您只有 & 作为无效字符,那么您可以使用正则表达式将其替换为 &amp;.我们使用正则表达式来防止替换已经存在的 &amp;&quot;&#111; 等符号.

If you have only & as invalid character, then you can use regex to replace it with &amp;. We use regex to prevent replacement of already existing &amp;, &quot;, &#111;, etc. symbols.

正则表达式可以如下:

&(?!(?:lt|gt|amp|apos|quot|#\d+|#x[a-f\d]+);)

示例代码:

string content = @"<item><code>1234 &amp; test</code><title>voi hoody & polo shirt + Mckenzie jumper&other stuff</title><description>Good condition size small - medium, text me if interested</description></item>";
content = Regex.Replace(content, @"&(?!(?:lt|gt|amp|apos|quot|#\d+|#x[a-f\d]+);)", "&amp;", RegexOptions.IgnoreCase);
XElement xItem = XElement.Parse(content);

这篇关于解析包含非法字符的 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆