为 XML 编码文本数据的最佳方式 [英] Best way to encode text data for XML

查看:26
本文介绍了为 XML 编码文本数据的最佳方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在 .Net 中寻找一种通用方法来对字符串进行编码以用于 Xml 元素或属性,但当我没有立即找到时感到很惊讶.那么,在我走得更远之前,我是否会错过内置功能?

I was looking for a generic method in .Net to encode a string for use in an Xml element or attribute, and was surprised when I didn't immediately find one. So, before I go too much further, could I just be missing the built-in function?

暂时假设它确实不存在,我正在组合我自己的通用 EncodeForXml(string data) 方法,并且我正在考虑执行此操作的最佳方法.

Assuming for a moment that it really doesn't exist, I'm putting together my own generic EncodeForXml(string data) method, and I'm thinking about the best way to do this.

我正在使用的数据提示整个事件可能包含坏字符,如 &、<、" 等.有时它也可能包含正确转义的实体:&amp;、&lt;, 和 &quot;,这意味着仅使用 CDATA 部分可能不是最好的主意.无论如何,这似乎有点笨拙;我宁愿最终得到一个可以直接在 xml 中使用的不错的字符串值.

The data I'm using that prompted this whole thing could contain bad characters like &, <, ", etc. It could also contains on occasion the properly escaped entities: &amp;, &lt;, and &quot;, which means just using a CDATA section may not be the best idea. That seems kinda klunky anyay; I'd much rather end up with a nice string value that can be used directly in the xml.

我过去曾使用正则表达式来捕获错误的&符号,我正在考虑在这种情况下以及第一步中使用它来捕获它们,然后对其他字符进行简单的替换.

I've used a regular expression in the past to just catch bad ampersands, and I'm thinking of using it to catch them in this case as well as the first step, and then doing a simple replace for other characters.

那么,是否可以在不使其过于复杂的情况下进一步优化,有什么我遗漏的吗?:

So, could this be optimized further without making it too complex, and is there anything I'm missing? :

Function EncodeForXml(ByVal data As String) As String
    Static badAmpersand As new Regex("&(?![a-zA-Z]{2,6};|#[0-9]{2,4};)")

    data = badAmpersand.Replace(data, "&amp;")

    return data.Replace("<", "&lt;").Replace("""", "&quot;").Replace(">", "gt;")
End Function

对不起你们所有的 C#--只有那些人--我真的不在乎我使用哪种语言,但我想让 Regex 成为静态的,你不能在 C# 中这样做而不在方法之外声明它, 所以这将是 VB.Net

最后,我们仍然在我工作的 .Net 2.0 上,但如果有人可以将最终产品转化为字符串类的扩展方法,那也太酷了.

Finally, we're still on .Net 2.0 where I work, but if someone could take the final product and turn it into an extension method for the string class, that'd be pretty cool too.

更新 前几个回复表明 .Net 确实有这样做的内置方法.但是现在我已经开始了,我有点想完成我的 EncodeForXml() 方法只是为了它的乐趣,所以我仍在寻找改进的想法.值得注意的是:应该编码为实体的更完整的字符列表(可能存储在列表/映射中),以及比在不可变字符串上串行执行 .Replace() 获得更好性能的东西.

Update The first few responses indicate that .Net does indeed have built-in ways of doing this. But now that I've started, I kind of want to finish my EncodeForXml() method just for the fun of it, so I'm still looking for ideas for improvement. Notably: a more complete list of characters that should be encoded as entities (perhaps stored in a list/map), and something that gets better performance than doing a .Replace() on immutable strings in serial.

推荐答案

System.XML 为您处理编码,因此您不需要这样的方法.

System.XML handles the encoding for you, so you don't need a method like this.

这篇关于为 XML 编码文本数据的最佳方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆