最有效的方式来逃避XML / HTML的C ++字符串? [英] Most efficient way to escape XML/HTML in C++ string?

查看:204
本文介绍了最有效的方式来逃避XML / HTML的C ++字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法相信这个问题一直没有问过。予有需要被插入到HTML文件中的字符串,但它可以含有特殊的HTML字符。我想用适当的HTML再presentation替换这些。

以下作品中的code,但为pretty的冗长和丑陋。性能是不是我的应用程序的关键,但我想有可扩展性问题在这里也。如何提高呢?我想这是STL的算法或者一些深奥的升压功能的工作,但低于code是最好的,我可以拿出自己。

 无效逃生(标准::字符串*数据)
{
    标准::字符串:: size_type的POS = 0;
    对于 (;;)
    {
        POS =数据 - > find_first_of(\&放大器;<>中,POS);
        如果(POS ==标准::字符串::非营利机构)破;
        性病::字符串替换;
        开关((*数据)[POS])
        {
        案\':更换=&放大器; QUOT;;打破;
        案件'和;':更换=&放大器;放大器;;打破;
        案'<':更换=&放大器; LT;;打破;
        案'>':更换=&放大器; GT;;打破;
        默认: ;
        }
        数据 - >更换(POS,1,更换);
        POS + = replacement.size();
    };
}
 

解决方案

而不是仅仅更换了原来的字符串,你可以复制与即时更换它避免了移动字符的字符串研究。这将有更好的复杂性和缓存行为,所以我期待一个巨大的进步。或者你可以使用<一个href="http://www.tena-sda.org/doc/5.2.1/boost/d3/df1/namespaceboost_1_1spirit_1_1xml.html">boost::spirit::xml EN code 或 HTTP://$c$c.google.com/p/pugixml /

 无效连接code(标准::字符串和放大器;数据){
    性病::字符串缓冲区;
    buffer.reserve(data.size());
    用于(为size_t POS = 0;!POS = data.size(); ++ POS){
        开关(数据[POS]){
            案件'和;':buffer.append(&放大器;放大器;);打破;
            案\':buffer.append(&放大器; QUOT;);打破;
            案例'\'':buffer.append(&放大器;者;);打破;
            案'&LT;':buffer.append(&放大器; LT;);打破;
            案'&GT;':buffer.append(与&amp; gt;中);打破;
            默认:buffer.append(安培;数据[POS],1);打破;
        }
    }
    data.swap(缓冲液);
}
 

编辑:小的提升,可以通过使用启发式来确定缓冲区的大小来实现的。替换为 buffer.reserve data.size()* 1.1 (10%)或类似的东西取决于如何很多替代品的预期。

I can't believe this question hasn't been asked before. I have a string that needs to be inserted into an HTML file but it may contain special HTML characters. I want to replace these with the appropriate HTML representation.

The code below works but is pretty verbose and ugly. Performance is not critical for my application but I guess there are scalability problems here also. How can I improve this? I guess this is a job for STL algorithms or some esoteric Boost function, but the code below is the best I can come up with myself.

void escape(std::string *data)
{
    std::string::size_type pos = 0;
    for (;;)
    {
        pos = data->find_first_of("\"&<>", pos);
        if (pos == std::string::npos) break;
        std::string replacement;
        switch ((*data)[pos])
        {
        case '\"': replacement = "&quot;"; break;   
        case '&':  replacement = "&amp;";  break;   
        case '<':  replacement = "&lt;";   break;   
        case '>':  replacement = "&gt;";   break;   
        default: ;
        }
        data->replace(pos, 1, replacement);
        pos += replacement.size();
    };
}

解决方案

Instead of just replacing in the original string, you can do copying with on-the-fly replacement which avoids having to move characters in the string. This will have much better complexity and cache behavior, so I'd expect a huge improvement. Or you can use boost::spirit::xml encode or http://code.google.com/p/pugixml/.

void encode(std::string& data) {
    std::string buffer;
    buffer.reserve(data.size());
    for(size_t pos = 0; pos != data.size(); ++pos) {
        switch(data[pos]) {
            case '&':  buffer.append("&amp;");       break;
            case '\"': buffer.append("&quot;");      break;
            case '\'': buffer.append("&apos;");      break;
            case '<':  buffer.append("&lt;");        break;
            case '>':  buffer.append("&gt;");        break;
            default:   buffer.append(&data[pos], 1); break;
        }
    }
    data.swap(buffer);
}

EDIT: A small improvement can be achieved by using an heuristic to determine the size of the buffer. Replace the buffer.reserve line with data.size()*1.1 (10%) or something similar depending of how much replacements are expected.

这篇关于最有效的方式来逃避XML / HTML的C ++字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆