最有效的方式来逃避XML / HTML的C ++字符串? [英] Most efficient way to escape XML/HTML in C++ string?
问题描述
我无法相信这个问题一直没有问过。予有需要被插入到HTML文件中的字符串,但它可以含有特殊的HTML字符。我想用适当的HTML再presentation替换这些。
以下作品中的code,但为pretty的冗长和丑陋。性能是不是我的应用程序的关键,但我想有可扩展性问题在这里也。如何提高呢?我想这是STL的算法或者一些深奥的升压功能的工作,但低于code是最好的,我可以拿出自己。
无效逃生(标准::字符串*数据)
{
标准::字符串:: size_type的POS = 0;
对于 (;;)
{
POS =数据 - > find_first_of(\&放大器;<>中,POS);
如果(POS ==标准::字符串::非营利机构)破;
性病::字符串替换;
开关((*数据)[POS])
{
案\':更换=&放大器; QUOT;;打破;
案件'和;':更换=&放大器;放大器;;打破;
案'<':更换=&放大器; LT;;打破;
案'>':更换=&放大器; GT;;打破;
默认: ;
}
数据 - >更换(POS,1,更换);
POS + = replacement.size();
};
}
而不是仅仅更换了原来的字符串,你可以复制与即时更换它避免了移动字符的字符串研究。这将有更好的复杂性和缓存行为,所以我期待一个巨大的进步。或者你可以使用<一个href="http://www.tena-sda.org/doc/5.2.1/boost/d3/df1/namespaceboost_1_1spirit_1_1xml.html">boost::spirit::xml EN code 或 HTTP://$c$c.google.com/p/pugixml / 。
无效连接code(标准::字符串和放大器;数据){
性病::字符串缓冲区;
buffer.reserve(data.size());
用于(为size_t POS = 0;!POS = data.size(); ++ POS){
开关(数据[POS]){
案件'和;':buffer.append(&放大器;放大器;);打破;
案\':buffer.append(&放大器; QUOT;);打破;
案例'\'':buffer.append(&放大器;者;);打破;
案'&LT;':buffer.append(&放大器; LT;);打破;
案'&GT;':buffer.append(与&amp; gt;中);打破;
默认:buffer.append(安培;数据[POS],1);打破;
}
}
data.swap(缓冲液);
}
编辑:小的提升,可以通过使用启发式来确定缓冲区的大小来实现的。替换为 buffer.reserve
行 data.size()* 1.1
(10%)或类似的东西取决于如何很多替代品的预期。
I can't believe this question hasn't been asked before. I have a string that needs to be inserted into an HTML file but it may contain special HTML characters. I want to replace these with the appropriate HTML representation.
The code below works but is pretty verbose and ugly. Performance is not critical for my application but I guess there are scalability problems here also. How can I improve this? I guess this is a job for STL algorithms or some esoteric Boost function, but the code below is the best I can come up with myself.
void escape(std::string *data)
{
std::string::size_type pos = 0;
for (;;)
{
pos = data->find_first_of("\"&<>", pos);
if (pos == std::string::npos) break;
std::string replacement;
switch ((*data)[pos])
{
case '\"': replacement = """; break;
case '&': replacement = "&"; break;
case '<': replacement = "<"; break;
case '>': replacement = ">"; break;
default: ;
}
data->replace(pos, 1, replacement);
pos += replacement.size();
};
}
Instead of just replacing in the original string, you can do copying with on-the-fly replacement which avoids having to move characters in the string. This will have much better complexity and cache behavior, so I'd expect a huge improvement. Or you can use boost::spirit::xml encode or http://code.google.com/p/pugixml/.
void encode(std::string& data) {
std::string buffer;
buffer.reserve(data.size());
for(size_t pos = 0; pos != data.size(); ++pos) {
switch(data[pos]) {
case '&': buffer.append("&"); break;
case '\"': buffer.append("""); break;
case '\'': buffer.append("'"); break;
case '<': buffer.append("<"); break;
case '>': buffer.append(">"); break;
default: buffer.append(&data[pos], 1); break;
}
}
data.swap(buffer);
}
EDIT: A small improvement can be achieved by using an heuristic to determine the size of the buffer. Replace the buffer.reserve
line with data.size()*1.1
(10%) or something similar depending of how much replacements are expected.
这篇关于最有效的方式来逃避XML / HTML的C ++字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!