如何解码含有\x3c等Feedburner的结果 [英] How to decode Feedburner result containing \x3c and so on

查看:521
本文介绍了如何解码含有\x3c等Feedburner的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

饲料刻录机改变了自己的博客服务返回结果,它返回一个类似的JavaScript块:

Feed burner changed their blog service return results that it returns blocks of javascript similar to:

的document.write(\x3cdiv
class\x3d\x22feedburnerFeedBlock\x22
id\x3d\x22RitterInsuranceMarketingRSSv3iugf6igask14fl8ok645b6l0\x22\x3e);
的document.write(\x3cul\x3e);
的document.write(\x3cli\x3e\x3cspan
class\x3d\x22headline\x22\x3e\x3ca
href\x3d\x22

document.write("\x3cdiv class\x3d\x22feedburnerFeedBlock\x22 id\x3d\x22RitterInsuranceMarketingRSSv3iugf6igask14fl8ok645b6l0\x22\x3e"); document.write("\x3cul\x3e"); document.write("\x3cli\x3e\x3cspan class\x3d\x22headline\x22\x3e\x3ca href\x3d\x22

我想原始HTML出于此。以前我能够轻松地只使用.Replace切割出的document.write语法,但我想不出有什么样的编码这个是或至少如何将它使用C#解码

I want the raw html out of this. Previously I was able to easily just use .Replace to cleave out the document.write syntax but I can't figure out what kind of encoding this is or atleast how to decode it with C#.

编辑:嗯,这是一个半-nightmare终于解决了,这里就是我想出了柜面任何人有任何改进,以提供

Well this was a semi-nightmare to finally solve, here's what I came up with incase anyone has any improvements to offer

public static  char ConvertHexToASCII(this string hex)
{
    if (hex == null) throw new ArgumentNullException(hex);
    return (char)Convert.ToByte(hex, 16);
}

private string DecodeFeedburnerHtml(string html)
{
    var builder = new StringBuilder(html.Length);
    var stack = new Stack<char>(4);
    foreach (var chr in html)
    {
        switch (chr)
        {
            case '\\':
                if (stack.Count == 0)
                {
                    stack.Push(chr);
                }
                else
                {
                    stack.Clear();
                    builder.Append(chr);
                }
                break;
            case 'x':
                if (stack.Count == 1)
                {
                    stack.Push(chr);
                }
                else
                {
                    stack.Clear();
                    builder.Append(chr);
                }
                break;
            default:
                if (stack.Count >= 2)
                {
                    stack.Push(chr);

                    if (stack.Count == 4)
                    {
                        //get stack[3]stack[4]
                        string hexString = string.Format("{1}{0}", stack.Pop(),
                                                     stack.Pop());

                        builder.Append(hexString.ConvertHexToASCII());
                        stack.Clear();
                    }
                }
                else
                {
                    builder.Append(chr);
                }
                break;
        }
    }

    html = builder.ToString();
    return html;
}



不知道还有什么我可以做的更好。出于某种原因,这样的代码总是感觉真脏我,即使它是一个线性时间算法,我想这是关系到它多久会。

Not sure what else I could do better. For some reason code like this always feels really dirty to me even though it's a linear time algorithm I guess this is related to how long it has to be.

推荐答案

这些看起来像ASCII值,十六进制编码。你可以遍历字符串,每当你找到一个 \x 后跟两个十六进制数字(0-9,A-F),以及相应的ASCII字符替换它。如果字符串很长,这将是更快地逐步将结果保存到的的StringBuilder 而不是使用与string.replace()

Those look like ASCII values, encoded in hex. You could traverse the string, and whenever you find a \x followed by two hexadecimal digits (0-9,a-f), replace it with the corresponding ASCII character. If the string is long, it would be faster to save the result incrementally to a StringBuilder instead of using String.Replace().

不知的编码规范的,但有可能是更规则可循(例如,如果 \\ 是一个文字 \ )。

I don't know the encoding specification, but there might be more rules to follow (for example, if \\ is an escape character for a literal \).

这篇关于如何解码含有\x3c等Feedburner的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆