加载字符串转换成XML文档对象之前拆除所有的十六进制字符? [英] Remove all hexadecimal characters before loading string into XML Document Object?
问题描述
我有被发布到服务器上的ashx的处理XML字符串。 XML字符串是建立在客户端,并且基于取得的窗体上的几个不同的条目。偶尔有些用户会复制并从其他渠道进入Web表单粘贴。当我尝试加载XML字符串转换成使用 xmldoc.LoadXml(xmlStr)
我得到了以下异常的的XMLDocument
对象
I have an xml string that is being posted to an ashx handler on the server. The xml string is built on the client-side and is based on a few different entries made on a form. Occasionally some users will copy and paste from other sources into the web form. When I try to load the xml string into an XMLDocument
object using xmldoc.LoadXml(xmlStr)
I get the following exception:
System.Xml.XmlException = {"'', hexadecimal value 0x0B, is an invalid character. Line 2, position 1."}
在调试模式下,我可以看到胭脂字符(对不起,我不知道它的官方称谓):
In debug mode I can see the rouge character (sorry I'm not sure of it's official title?):
我的问题是我怎么能消毒XML字符串之前,我试图将其加载到XMLDocument对象?我是否需要自定义函数解析出所有这类人物一个接一个或者我可以使用一些本地.NET4类删除它们?
My questions is how can I sanitise the xml string before I attempt to load it into the XMLDocument object? Do I need a custom function to parse out all these sorts of characters one-by-one or can I use some native .NET4 class to remove them?
推荐答案
这里有一个例子使用清洁XML无效字符正则表达式
:
Here you have an example to clean xml invalid characters using Regex
:
xmlString = CleanInvalidXmlChars(xmlString);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xmlString);
public static string CleanInvalidXmlChars(string text)
{
string re = @"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]";
return Regex.Replace(text, re, "");
}
这篇关于加载字符串转换成XML文档对象之前拆除所有的十六进制字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!