如何阅读一个安全的RSS馈送到SyndicationFeed无需提供凭据? [英] How do I read a secure rss feed into a SyndicationFeed without providing credentials?

查看:236
本文介绍了如何阅读一个安全的RSS馈送到SyndicationFeed无需提供凭据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

无论出于何种原因,IBM使用HTTPS(无需凭证)为他们的RSS源。我试图消耗<一个href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/roller-ui/rendering/feed/gradybooch/entries/rss?lang=en" rel="nofollow">https://www.ibm.com/developerworks/mydeveloperworks/blogs/roller-ui/rendering/feed/gradybooch/entries/rss?lang=en与.NET 4 SyndicationFeed。我可以打开这个饲料在浏览器中,它加载就好了。这里的code:

For whatever reason, IBM uses https (without requiring credentials) for their RSS feeds. I'm trying to consume https://www.ibm.com/developerworks/mydeveloperworks/blogs/roller-ui/rendering/feed/gradybooch/entries/rss?lang=en with a .NET 4 SyndicationFeed. I can open this feed in a browser and it loads just fine. Here's the code:

        using (XmlReader xml = XmlReader.Create("https://www.ibm.com/developerworks/mydeveloperworks/blogs/roller-ui/rendering/feed/gradybooch/entries/rss?lang=en"))
        {
            var items = from item in SyndicationFeed.Load(xml).Items
                        select item;
        }

下面是例外:

System.Net.WebException was unhandled by user code
Message=The remote server returned an error: (500) Internal Server Error.
Source=System
StackTrace:
   at System.Net.HttpWebRequest.GetResponse()
   at System.Xml.XmlDownloadManager.GetNonFileStream(Uri uri, ICredentials credentials, IWebProxy proxy, RequestCachePolicy cachePolicy)
   at System.Xml.XmlDownloadManager.GetStream(Uri uri, ICredentials credentials, IWebProxy proxy, RequestCachePolicy cachePolicy)
   at System.Xml.XmlUrlResolver.GetEntity(Uri absoluteUri, String role, Type ofObjectToReturn)
   at System.Xml.XmlReaderSettings.CreateReader(String inputUri, XmlParserContext inputContext)
   at System.Xml.XmlReader.Create(String inputUri, XmlReaderSettings settings, XmlParserContext inputContext)
   at System.Xml.XmlReader.Create(String inputUri)
   at EDN.Util.Test.FeedAggTest.LoadFeedInfoTest() in D:\cdn\trunk\CDN\Dev\Shared\net\EDN.Util\EDN.Util.Test\FeedAggTest.cs:line 126

如何配置读卡器使用https饲料工作?

How do I configure the reader to work with an https feed?

推荐答案

我不认为这有什么做的安全性。 500错误是服务器端错误。东西由XmlReader.Create(URL)产生的请求被混淆了IBM的网站。如果它只是一个安全问题,如你的问题建议,那么你会希望得到一个403错误,或者拒绝授权。但是,你有500,这是一个应用程序错误。

I don't think it has anything to do with security. A 500 error is a server-side error. Something in the request generated by XmlReader.Create(url) is confusing the ibm website. If it was simply a security issue, as suggested in your question, then you'd expect to get a 403 error, or "Authorization Denied". But you got 500, which is an application error.

即便如此,也许有一些客户端应用程序可以做,以避免混淆服务器。

Even so, maybe there's something the client app can do, to avoid confusing the server.

我看着即将离任的HTTP请求头,使用提琴手。对于通过IE浏览器产生的请求,标题是这样的:

I looked at the outgoing HTTP request headers, using Fiddler. For a request generated by IE, the headers look like this:

GET https://www.ibm.com/developerworks/mydeveloperworks/blogs/roller-ui/rendering/feed/gradybooch/entries/rss?lang=en HTTP/1.1
Accept: image/gif, image/jpeg, image/pjpeg, application/x-ms-application, application/vnd.ms-xpsdocument, application/xaml+xml, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-silverlight, application/x-shockwave-flash, application/x-silverlight-2-b2, */*
Accept-Language: en-us
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Trident/4.0; .NET CLR 3.5.30729;)
Accept-Encoding: gzip, deflate
Host: www.ibm.com
Connection: Keep-Alive
Cookie: UnicaNIODID=Ww06gyvyPpZ-WPl6K7y; conxnsCookie=en; IBMPOLLCOOKIE=""; UnicaNIODID=QridYHCNf7M-WYM8Usr

有关从XmlReader.Create(URL)的请求,标题是这样的:

For a request from XmlReader.Create(url), the headers look like this:

GET https://www.ibm.com/developerworks/mydeveloperworks/blogs/roller-ui/rendering/feed/gradybooch/entries/rss?lang=en HTTP/1.1
Host: www.ibm.com
Connection: Keep-Alive

相当差。此外,在应对后,我得到了一个设置Cookie 头,在500响应,这是不是present在应对IE浏览器。

Quite a difference. Also, in the response to the latter, I got a Set-Cookie header, in the 500 response, which wasn't present in the response to IE.

根据我的理论,这是在请求头的区别,尤其是饼干,这是混淆ibm.com。

Based on that I theorized that it was the difference in request headers, in particular the cookie, that was confusing ibm.com.

我不知道如何说服XmlReader.Create()嵌入所有的请求头,我想,包括饼干。但我知道该怎么做与HttpWebRequest的。所以我用。

I don't know how to convince XmlReader.Create() to embed all the request headers I wanted, including the cookie. But I know how to do that with an HttpWebRequest. So I used that.

有几个障碍,我必须说清楚。

There were a few hurdles I had to clear.

  1. 我需要的ibm.com的持久性cookie。为此,我不得不求助于在Win32的AP /调用 InternetGetCookie 。见所附的用户贡献内容的PersistentCookies类的文档页面的的 WebRequest的,对于如何做到这一点。安装了饼干后,我不再得到500错误。万岁!

  1. I needed the persistent cookie for ibm.com. For that I had to resort to a p/invoke of the Win32 InternetGetCookie. See the PersistentCookies class attached in the user-contributed content at the bottom of the doc page for WebRequest, for how to do that. After attaching the cookie, I was no longer getting 500 errors. Hooray!

但是所得流不能由XmlReader.Create读取()。它看起来二进制给我。我意识到我需要去COM preSS gzip的,或缩小内容。为此,我不得不<打击>紧裹GZipStream或DeflateStream各地接收到的响应流中,并使用DECOM pressing流的XmlReader。设置<一个href="http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.automaticdecom$p$pssion.aspx">AutomaticDecom$p$pssion财产的HttpWebRequest。我本来可以避免这方面的需要通过不包括在出站请求中的接受编码头的gzip,紧缩。其实,设置AutomaticDecom pression财产后,这些标头在出站HTTP请求设置含蓄。

But the resulting stream could not be read by XmlReader.Create(). It looked binary to me. I realized I needed to de-compress the gzip or deflated content. For that I had to wrap a GZipStream or DeflateStream around the received response stream, and use the decompressing stream for XmlReader. set the AutomaticDecompression property on HttpWebRequest. I could have avoided the need for this by not including "gzip, deflate" on the Accept-Encoding header in the outbound request. Actually, after setting the AutomaticDecompression property, those headers are set implicitly in the outbound HTTP Request.

当我这样做,我得到了实际的文本。但一些字节codeS的人了。接下来,我需要使用正确的文本编码在TextReader的,如HttpWebResponse表示。

When I did that, I got actual text. But some of the byte codes were off. Next I needed to use the proper text encoding in the TextReader, as indicated in the HttpWebResponse.

这样做之后,我得到了一个合理的字符串,而由此产生的DECOM pressed RSS流引起的XmlReader呛,以

After doing that, I got a sensible string, but the resulting decompressed rss stream caused the XmlReader to choke, with

ReadElementString方法只能算得上简单或空内容元素。 11号线,25位。

我看了一下,发现一个小&LT;脚本&GT; 块,在该位置,在&LT;版权&GT; RSS文档中的元素。看来IBM正在试图让浏览器通过附加逻辑将在浏览器中运行格式化的日期为本地化的版权日期。似乎有点小题大做了我,甚至是错误由IBM。但是,因为元素的文本节点中的尖括号困扰的XmlReader的,我删除了正则表达式的脚本块替换。

I looked and found a small <script> block, at that location, within the <copyright> element in the rss document. It seems IBM is trying to get the browser to "localize" the copyright date by attaching logic that would run in the browser to format the date. Seems like overkill to me, or even a bug by IBM. But because the angle bracket within the text node of an element bothered the XmlReader, I removed the script block with a Regex replace.

清除这些障碍后,它的工作。在.NET应用程序能够读取从HTTPS URL中的RSS流。


After clearing those hurdles, it worked. The .NET app was able to read the RSS stream from that https url.

我没有做任何进一步的测试 - 看看是否改变接受头或接受编码头会改变行为。这是你搞清楚,如果你的关心。

I didn't do any further testing - to see if varying the Accept header or the Accept-Encoding header would change the behavior. That's for you to figure out, if you care.

由此产生的code是如下。它比你简单的三班轮更恶心。我不知道如何使它任何简单。

The resulting code is below. It's much uglier than your simple 3-liner. I don't know how to make it any simpler.

public void Run()
{
    string url;
    url = "https://www.ibm.com/developerworks/mydeveloperworks/blogs/roller-ui/rendering/feed/gradybooch/entries/rss?lang=en";

    HttpWebRequest hwr = (HttpWebRequest) WebRequest.Create(url);
    // attach persistent cookies
    hwr.CookieContainer =
        PersistentCookies.GetCookieContainerForUrl(url);
    hwr.Accept = "text/xml, */*";
    hwr.Headers.Add(HttpRequestHeader.AcceptLanguage, "en-us");
    hwr.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; .NET CLR 3.5.30729;)";
    hwr.KeepAlive = true;
    hwr.AutomaticDecompression = DecompressionMethods.Deflate |
                                 DecompressionMethods.GZip;

    using (var resp = (HttpWebResponse) hwr.GetResponse())
    {
        using(Stream s = resp.GetResponseStream())
        {            
            string cs = String.IsNullOrEmpty(resp.CharacterSet) ? "UTF-8" : resp.CharacterSet;
            Encoding e = Encoding.GetEncoding(cs);

            using (StreamReader sr = new StreamReader(s, e))
            {
                var allXml = sr.ReadToEnd();

                // remove any script blocks - they confuse XmlReader
                allXml = Regex.Replace( allXml,
                                        "(.*)<script type='text/javascript'>.+?</script>(.*)",
                                        "$1$2",
                                        RegexOptions.Singleline);

                using (XmlReader xmlr = XmlReader.Create(new StringReader(allXml)))
                {
                    var items = from item in SyndicationFeed.Load(xmlr).Items
                        select item;
                }
            }
        }
    }
}

这篇关于如何阅读一个安全的RSS馈送到SyndicationFeed无需提供凭据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆