阅读网页源并返回其主题和摘要新闻 [英] read webpage source and return its topic and summary news

查看:68
本文介绍了阅读网页源并返回其主题和摘要新闻的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好。

我想在c#中编写一个连接到新闻网页的程序并返回其主题和当前的摘要新闻...

如何编写此代码?我写了这段代码。但是我不知道它的权利与否?

它会返回所有链接。但我想只返回主题和摘要!

tank

Hi There.
I want to write a program in c# that in connect to a news webpage And return its topics and ofcourse its summary news...
how can i write this code ? I wrote this code .but i dont know its right or no?
it returns all links .but I want to return just topics and summaries!
tanks

public struct LinkItem
     {
         public string Href;
         public string Text;

         public override string ToString()
         {
             return Href + "\n\t" + Text;
         }
     }

     static class LinkFinder
     {
         public static List<LinkItem> Find(string file)
         {
             List<LinkItem> list = new List<LinkItem>();

             // 1.
             // Find all matches in file.
             MatchCollection m1 = Regex.Matches(file, @"(<a.*?>.*?</a>)",
                 RegexOptions.Singleline);

             // 2.
             // Loop over each match.
             foreach (Match m in m1)
             {
                 string value = m.Groups[1].Value;
                 LinkItem i = new LinkItem();

                 // 3.
                 // Get href attribute.
                 Match m2 = Regex.Match(value, @"href=\""(.*?)\""",
                 RegexOptions.Singleline);
                 if (m2.Success)
                 {
                     i.Href = m2.Groups[1].Value;
                 }

                 // 4.
                 // Remove inner tags from text.
                  string t = Regex.Replace(value, @"\s*<.*?>\s*", "",
                 RegexOptions.Singleline);
                 i.Text = t;

                 list.Add(i);
             }
             return list;
         }
         static void Main(string[] args)
         {
             WebClient w = new WebClient();
             string s = w.DownloadString("http://www.bbc.co.uk/news");


             //
             MemoryStream ms = new MemoryStream();
             StreamWriter sw = new StreamWriter(ms);

             foreach (LinkItem i in LinkFinder.Find(s))
             {
                 sw.WriteLine (i);
                 sw.Flush();


             }
             ms.WriteTo(File.Create(@"E:\oop1.txt"));
             sw.Close();
             ms.Close();
         }
     }
 }

推荐答案

读取 SiteMapper工具 [ ^ ],尤其是遍历网站部分。



希望有帮助
Read SiteMapper Tool[^], especially the section Traversing the Web Site.

Hope that helps


这篇关于阅读网页源并返回其主题和摘要新闻的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆