解析包含非法字符的 XML [英] Parsing XML which contains illegal characters
问题描述
我从服务器收到的消息包含标签,标签中包含我需要的数据.
A message I receive from a server contains tags and in the tags is the data I need.
我尝试将有效负载解析为 XML,但生成了非法字符异常.
I try to parse the payload as XML but illegal character exceptions are generated.
我也使用了httpUtility
和Security Utility
来转义非法字符,唯一的问题是,它会转义<>
解析 XML 所需的代码.
I also made use of httpUtility
and Security Utility
to escape the illegal characters, only problem is, it will escape < >
which is needed to parse the XML.
我的问题是,当 XML 中包含的数据包含非法的非 XML 字符时,我该如何解析它?(& -> amp;)
_
My question is, how do I parse XML when the data contained in it contains illegal non XML characters? (& -> amp;)
_
谢谢.
示例:
<item><code>1234</code><title>voi hoody & polo shirt + Mckenzie jumper</title><description>Good condition size small - medium, text me if interested</description></item>
推荐答案
如果您只有 &
作为无效字符,那么您可以使用正则表达式将其替换为 &代码>.我们使用正则表达式来防止替换已经存在的
&
、"
、o
等符号.
If you have only &
as invalid character, then you can use regex to replace it with &
. We use regex to prevent replacement of already existing &
, "
, o
, etc. symbols.
正则表达式可以如下:
&(?!(?:lt|gt|amp|apos|quot|#\d+|#x[a-f\d]+);)
示例代码:
string content = @"<item><code>1234 & test</code><title>voi hoody & polo shirt + Mckenzie jumper&other stuff</title><description>Good condition size small - medium, text me if interested</description></item>";
content = Regex.Replace(content, @"&(?!(?:lt|gt|amp|apos|quot|#\d+|#x[a-f\d]+);)", "&", RegexOptions.IgnoreCase);
XElement xItem = XElement.Parse(content);
这篇关于解析包含非法字符的 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!