阅读网页源并返回其主题和摘要新闻 [英] read webpage source and return its topic and summary news
本文介绍了阅读网页源并返回其主题和摘要新闻的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
你好。
我想在c#中编写一个连接到新闻网页的程序并返回其主题和当前的摘要新闻...
如何编写此代码?我写了这段代码。但是我不知道它的权利与否?
它会返回所有链接。但我想只返回主题和摘要!
tank
Hi There.
I want to write a program in c# that in connect to a news webpage And return its topics and ofcourse its summary news...
how can i write this code ? I wrote this code .but i dont know its right or no?
it returns all links .but I want to return just topics and summaries!
tanks
public struct LinkItem
{
public string Href;
public string Text;
public override string ToString()
{
return Href + "\n\t" + Text;
}
}
static class LinkFinder
{
public static List<LinkItem> Find(string file)
{
List<LinkItem> list = new List<LinkItem>();
// 1.
// Find all matches in file.
MatchCollection m1 = Regex.Matches(file, @"(<a.*?>.*?</a>)",
RegexOptions.Singleline);
// 2.
// Loop over each match.
foreach (Match m in m1)
{
string value = m.Groups[1].Value;
LinkItem i = new LinkItem();
// 3.
// Get href attribute.
Match m2 = Regex.Match(value, @"href=\""(.*?)\""",
RegexOptions.Singleline);
if (m2.Success)
{
i.Href = m2.Groups[1].Value;
}
// 4.
// Remove inner tags from text.
string t = Regex.Replace(value, @"\s*<.*?>\s*", "",
RegexOptions.Singleline);
i.Text = t;
list.Add(i);
}
return list;
}
static void Main(string[] args)
{
WebClient w = new WebClient();
string s = w.DownloadString("http://www.bbc.co.uk/news");
//
MemoryStream ms = new MemoryStream();
StreamWriter sw = new StreamWriter(ms);
foreach (LinkItem i in LinkFinder.Find(s))
{
sw.WriteLine (i);
sw.Flush();
}
ms.WriteTo(File.Create(@"E:\oop1.txt"));
sw.Close();
ms.Close();
}
}
}
推荐答案
读取 SiteMapper工具 [ ^ ],尤其是遍历网站部分。
希望有帮助
Read SiteMapper Tool[^], especially the section Traversing the Web Site.
Hope that helps
这篇关于阅读网页源并返回其主题和摘要新闻的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文