所以我们有我们的HTML转义函数,真正工作在C ++的方式,如何unescape? [英] So we've got our HTML escape functions that really work in a C++ manner, how to do unescape?

查看:196
本文介绍了所以我们有我们的HTML转义函数,真正工作在C ++的方式,如何unescape?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里我发现一种格式化方式,对HTML编码/转义特殊字符。现在我想知道如何在C ++中解除HTML编码的文本?

Here I've found a grate way to HTML encode/escape special chars. Now I wonder how to unescape HTML encoded text in C++?

所以代码库是:

#include <algorithm>

namespace xml {

    // Helper for null-terminated ASCII strings (no end of string iterator).
    template<typename InIter, typename OutIter>
    OutIter copy_asciiz ( InIter begin, OutIter out )
    {
        while ( *begin != '\0' ) {
            *out++ = *begin++;
        }
        return (out);
    }

    // XML escaping in it's general form.  Note that 'out' is expected
    // to an "infinite" sequence.
    template<typename InIter, typename OutIter>
    OutIter escape ( InIter begin, InIter end, OutIter out )
    {
        static const char bad[] = "&<>";
        static const char* rep[] = {"&amp;", "&lt;", "&gt;"};
        static const std::size_t n = sizeof(bad)/sizeof(bad[0]);

        for ( ; (begin != end); ++begin )
        {
            // Find which replacement to use.
            const std::size_t i =
                std::distance(bad, std::find(bad, bad+n, *begin));

            // No need for escaping.
            if ( i == n ) {
                *out++ = *begin;
            }
            // Escape the character.
            else {
                out = copy_asciiz(rep[i], out);
            }
        }
        return (out);
    }

}

#include <iterator>
#include <string>

namespace xml {

    // Get escaped version of "content".
    std::string escape ( const std::string& content )
    {
        std::string result;
        result.reserve(content.size());
        escape(content.begin(), content.end(), std::back_inserter(result));
        return (result);
    }

    // Escape data on the fly, using "constant" memory.
    void escape ( std::istream& in, std::ostream& out )
    {
        escape(std::istreambuf_iterator<char>(in),
            std::istreambuf_iterator<char>(),
            std::ostreambuf_iterator<char>(out));
    }

}

它适用于:

#include <iostream>

int main ( int, char ** )
{
    std::cout << xml::escape("<foo>bar & qux</foo>") << std::endl;
}


$ b $ p

所以我不知道 - 如何让HTML以这种方式解析?

So I wonder - how to make HTML unescape in such manner?

推荐答案

看看我如何解决类似的问题'&# \\ d +);'字符串,即数字字符引用(NCR),使用 boost :: spirit boost :: regex_token_iterator Flex Perl

Take a look at how I've solved a similar problem for '&#(\d+);' strings i.e., numeric character references (NCRs) using boost::spirit, boost::regex_token_iterator, Flex, Perl.

在您的情况下,正则表达式如果您不需要转换&(amp | lt | gt); rel =nofollow>所有html实体

In your case the regex is &(amp|lt|gt); if you don't need to convert all html entities.

这篇关于所以我们有我们的HTML转义函数,真正工作在C ++的方式,如何unescape?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆